Dietary intake, Nutritional status, and Health outcomes among Vegan, Vegetarian and Omnivore families: results from the observational study
Statistical report - diet prediction (elastic net)
Authors and affiliations
Marina Heniková1,2, Anna Ouřadová1, Eliška Selinger1,3, Filip Tichanek4, Petra Polakovičová4, Dana Hrnčířová2, Pavel Dlouhý2, Martin Světnička5, Eva El-Lababidi5, Jana Potočková1, Tilman Kühn6, Monika Cahová4, Jan Gojda1
1 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic.
2 Department of Hygiene, Third Faculty of Medicine, Charles University, Prague, Czech Republic.
3 National Health Institute, Prague, Czech Republic.
4 Institute for Clinical and Experimental Medicine, Prague, Czech Republic.
5 Department of Pediatrics, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic.
6 Department of Epidemiology, MedUni, Vienna, Austria.
This is a statistical report of the study currenlty under review in the Communications Medicine journal.
When using this code or data, cite the original publication:
TO BE ADDED
BibTex citation for the original publication:
TO BE ADDED
Original GitHub repository: https://github.com/filip-tichanek/kompas_clinical
Statistical reports can be found on the reports hub.
Data analysis is described in detail in the statistical methods report.
1 Introduction
This project is designed to evaluate and compare clinical outcomes across three distinct dietary strategy groups:
- Vegans
- Vegetarians
- Omnivores
The dataset includes both adults and children, with data clustered within families.
1.1 Main Questions
The study addresses the following key questions:
Q1. Do clinical outcomes vary significantly across different diet strategies?
Q2. Beyond diet group, which factors (e.g., sex, age, breastfeeding status for children, or supplementation when applicable) most strongly influence clinical outcomes? How correlated (“clustered”) are these characteristics within the same family?
Q3. Could the clinical characteristics effectively discriminate between different diet groups?
1.2 Statistical Methods
For full methodological details, see this report. In brief:
Robust linear mixed-effects models (rLME) were used to estimate adjusted differences between diet groups (Q1) and assess the importance of other variables (Q2), including how much clinical characteristics tend to cluster within families. Covariates included age, sex, breastfeeding status for children, and relevant supplementation factors where applicable.
Elastic net logistic regression was employed to answer Q3, evaluating whether clinical characteristics provide a strong overall signal distinguishing between diet groups, incorporating a predictive perspective.
All analyses were conducted separately for adults and children.
Open code
# Import initiation file
source('r/368_initiation.R')2 Data wrangling
2.1 All children
Open code
skimr::skim(dat_child_all)| Name | dat_child_all |
| Number of rows | 142 |
| Number of columns | 75 |
| _______________________ | |
| Column type frequency: | |
| factor | 4 |
| numeric | 71 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| ID | 0 | 1 | FALSE | 142 | K10: 1, K10: 1, K10: 1, K11: 1 |
| FAM | 0 | 1 | FALSE | 95 | R25: 3, R41: 3, R55: 3, R70: 3 |
| SEX | 0 | 1 | FALSE | 2 | F: 73, M: 69 |
| GRP | 0 | 1 | FALSE | 3 | VN: 61, OM: 44, VG: 37 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| aAGE | 0 | 1.00 | 3.34 | 2.46 | 0.18 | 1.34 | 2.74 | 4.88 | 12.04 | ▇▅▃▁▁ |
| aBreastFeed_full | 0 | 1.00 | 6.04 | 2.10 | 0.00 | 6.00 | 6.00 | 6.00 | 18.00 | ▁▇▁▁▁ |
| aBreastFeed_total | 13 | 0.91 | 15.90 | 10.14 | 0.00 | 9.21 | 14.00 | 19.00 | 60.00 | ▇▇▂▁▁ |
| aBreastFeed_full_stopped | 0 | 1.00 | 0.92 | 0.27 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▁▁▇ |
| aBreastFeed_total_stopped | 0 | 1.00 | 0.68 | 0.47 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
| dev_delay | 0 | 1.00 | 0.02 | 0.14 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aMASS_Perc | 0 | 1.00 | 46.85 | 28.80 | 0.00 | 21.00 | 49.00 | 68.50 | 99.00 | ▇▆▃▇▅ |
| aHEIGHT_Perc | 0 | 1.00 | 45.35 | 27.47 | 0.00 | 19.50 | 44.00 | 68.75 | 100.00 | ▇▇▅▇▃ |
| aBMI_PERC | 0 | 1.00 | 50.50 | 24.96 | 0.00 | 34.25 | 50.00 | 68.00 | 100.00 | ▃▅▇▅▃ |
| aM_per_H_PERC | 0 | 1.00 | 49.98 | 24.85 | 0.00 | 33.00 | 51.00 | 66.75 | 99.00 | ▃▃▇▆▃ |
| aGLY | 10 | 0.93 | 4.45 | 0.59 | 3.26 | 4.16 | 4.42 | 4.72 | 7.49 | ▂▇▁▁▁ |
| aTC | 10 | 0.93 | 3.78 | 0.66 | 2.21 | 3.30 | 3.87 | 4.21 | 5.56 | ▂▆▇▅▁ |
| aHDL | 10 | 0.93 | 1.26 | 0.33 | 0.66 | 1.01 | 1.25 | 1.42 | 2.16 | ▆▇▇▂▂ |
| aLDL | 10 | 0.93 | 2.05 | 0.57 | 0.24 | 1.69 | 2.02 | 2.41 | 3.48 | ▁▃▇▆▂ |
| aTG | 10 | 0.93 | 1.06 | 0.60 | 0.30 | 0.66 | 0.92 | 1.22 | 3.57 | ▇▅▁▁▁ |
| aCa | 9 | 0.94 | 2.57 | 0.12 | 2.35 | 2.48 | 2.56 | 2.65 | 2.88 | ▃▇▆▅▁ |
| aP | 9 | 0.94 | 1.66 | 0.18 | 1.30 | 1.54 | 1.67 | 1.79 | 2.10 | ▃▆▇▅▁ |
| aMg | 9 | 0.94 | 0.86 | 0.07 | 0.71 | 0.81 | 0.85 | 0.90 | 1.17 | ▃▇▃▁▁ |
| aSe | 11 | 0.92 | 0.79 | 0.28 | 0.27 | 0.60 | 0.77 | 0.94 | 2.02 | ▅▇▃▁▁ |
| aZn | 11 | 0.92 | 11.56 | 2.41 | 7.00 | 10.15 | 11.30 | 12.80 | 21.30 | ▃▇▃▁▁ |
| aFE | 9 | 0.94 | 13.37 | 6.94 | 1.60 | 8.30 | 12.60 | 17.80 | 32.70 | ▆▇▇▂▁ |
| aVKFE | 10 | 0.93 | 71.66 | 9.12 | 48.00 | 66.00 | 72.00 | 77.00 | 103.00 | ▂▆▇▂▁ |
| aFERR | 9 | 0.94 | 18.62 | 15.03 | 3.10 | 10.40 | 15.20 | 21.70 | 133.70 | ▇▁▁▁▁ |
| aTRF | 9 | 0.94 | 2.84 | 0.36 | 1.90 | 2.60 | 2.85 | 3.05 | 4.10 | ▂▆▇▂▁ |
| aSATTRF | 9 | 0.94 | 18.90 | 9.89 | 2.10 | 11.60 | 18.10 | 24.90 | 43.70 | ▅▇▆▂▂ |
| aTRFINDEX | 9 | 0.94 | 1.55 | 0.74 | 0.59 | 1.13 | 1.38 | 1.70 | 5.41 | ▇▃▁▁▁ |
| aSTRF | 9 | 0.94 | 1.67 | 0.33 | 1.08 | 1.43 | 1.62 | 1.81 | 2.98 | ▅▇▂▁▁ |
| aHGB | 14 | 0.90 | 121.94 | 9.48 | 99.00 | 115.00 | 121.00 | 127.25 | 157.00 | ▁▇▆▂▁ |
| aMCV | 14 | 0.90 | 78.81 | 3.83 | 68.20 | 76.77 | 79.45 | 81.10 | 88.30 | ▂▃▇▇▁ |
| aPTH | 10 | 0.93 | 2.88 | 1.25 | 0.70 | 1.90 | 2.80 | 3.60 | 7.80 | ▆▇▃▁▁ |
| aCros | 11 | 0.92 | 1.26 | 0.31 | 0.57 | 1.08 | 1.23 | 1.42 | 2.71 | ▃▇▃▁▁ |
| aP1NP | 11 | 0.92 | 720.87 | 294.64 | 215.00 | 482.00 | 631.00 | 959.50 | 1200.99 | ▂▇▅▂▅ |
| aUI | 30 | 0.79 | 176.10 | 142.86 | 6.00 | 86.95 | 138.00 | 228.50 | 911.00 | ▇▃▁▁▁ |
| aUREA | 9 | 0.94 | 4.21 | 1.24 | 1.80 | 3.40 | 4.20 | 5.10 | 7.70 | ▅▇▇▃▁ |
| aCREA | 9 | 0.94 | 25.99 | 7.50 | 15.00 | 20.00 | 25.00 | 31.00 | 51.00 | ▇▅▅▁▁ |
| aUA | 9 | 0.94 | 226.97 | 48.32 | 78.00 | 197.00 | 224.00 | 257.00 | 391.00 | ▁▅▇▃▁ |
| aVIT_AKTB12 | 11 | 0.92 | 127.83 | 82.79 | 31.50 | 77.75 | 108.70 | 144.05 | 493.90 | ▇▃▁▁▁ |
| aHCY | 23 | 0.84 | 8.69 | 3.09 | 3.90 | 6.85 | 8.10 | 9.75 | 25.80 | ▇▆▁▁▁ |
| aMMA | 19 | 0.87 | 286.37 | 354.72 | 67.30 | 148.95 | 187.70 | 258.25 | 2980.40 | ▇▁▁▁▁ |
| aVIT_D | 9 | 0.94 | 92.73 | 30.42 | 36.90 | 69.80 | 87.60 | 110.20 | 194.20 | ▅▇▅▂▁ |
| aFOLAT | 10 | 0.93 | 16.41 | 4.66 | 4.50 | 13.05 | 16.90 | 19.65 | 24.04 | ▁▅▇▇▆ |
| aIGF1 | 20 | 0.86 | 110.37 | 63.87 | 24.60 | 63.60 | 92.35 | 136.98 | 318.00 | ▇▆▃▁▁ |
| aUr_Ca | 18 | 0.87 | 1.55 | 1.55 | 0.25 | 0.37 | 1.02 | 2.26 | 7.41 | ▇▂▁▁▁ |
| aCa_per_Krea | 18 | 0.87 | 0.48 | 0.51 | 0.03 | 0.12 | 0.31 | 0.64 | 2.75 | ▇▂▁▁▁ |
| aI_per_Krea | 31 | 0.78 | 553.90 | 748.54 | 60.27 | 189.31 | 351.80 | 563.53 | 5887.55 | ▇▁▁▁▁ |
| aP_per_Krea | 18 | 0.87 | 3.54 | 2.13 | 0.14 | 1.71 | 3.14 | 4.73 | 9.24 | ▇▇▆▂▂ |
| aAL_child | 0 | 1.00 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aBREAKS | 0 | 1.00 | 0.04 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_VEG1 | 0 | 1.00 | 0.05 | 0.22 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSup_B12 | 0 | 1.00 | 0.50 | 0.50 | 0.00 | 0.00 | 0.50 | 1.00 | 1.00 | ▇▁▁▁▇ |
| aSUP_FOL | 0 | 1.00 | 0.04 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_vitA | 0 | 1.00 | 0.01 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_vitB1 | 0 | 1.00 | 0.04 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_vit.B5 | 0 | 1.00 | 0.04 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_D | 0 | 1.00 | 0.72 | 0.45 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
| aSUP_Mg | 0 | 1.00 | 0.05 | 0.22 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_Zn | 0 | 1.00 | 0.04 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_Se | 0 | 1.00 | 0.02 | 0.14 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_Ca | 0 | 1.00 | 0.04 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_Fe | 0 | 1.00 | 0.07 | 0.26 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_Iod | 0 | 1.00 | 0.06 | 0.24 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_Ѡ3 | 0 | 1.00 | 0.40 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| aSUP_CHLO | 0 | 1.00 | 0.02 | 0.14 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_ALG | 0 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ▁▁▇▁▁ |
| aSUP_GB | 0 | 1.00 | 0.01 | 0.08 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_FORT | 0 | 1.00 | 0.01 | 0.08 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_PROB | 0 | 1.00 | 0.06 | 0.24 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| aSUP_OTH | 0 | 1.00 | 0.21 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| aBiW | 2 | 0.99 | 3336.26 | 541.50 | 2000.00 | 3047.50 | 3400.00 | 3657.50 | 5000.00 | ▂▅▇▃▁ |
| log2_age | 0 | 1.00 | 1.29 | 1.25 | -2.49 | 0.43 | 1.45 | 2.29 | 3.59 | ▁▃▆▇▅ |
| aBreastFeed_full_duration | 0 | 1.00 | 5.64 | 2.61 | 0.00 | 5.50 | 6.00 | 6.00 | 18.00 | ▁▇▁▁▁ |
2.1.1 Wrangle data
Open code
for (i in 1:ncol(dat_child_all)) {
names(dat_child_all)[i] <- str_replace_all(
names(dat_child_all)[i], c("^a" = "")
)
}
dat_child_all <- dat_child_all %>%
rename(
FOLATE = FOLAT,
TIBC = VKFE,
HOLOTC = VIT_AKTB12,
CTX = Cros,
uCCR = Ca_per_Krea,
uICR = I_per_Krea,
uPCR = P_per_Krea,
uCa = Ur_Ca
) %>%
mutate(
log2_TG = log2(TG),
log2_Se = log2(Se),
log2_Zn = log2(Zn),
log2_FERR = log2(FERR),
log2_TRFINDEX = log2(TRFINDEX),
log2_STRF = log2(STRF),
log2_CTX = log2(CTX),
log2_UI = log2(UI),
log2_CREA = log2(CREA),
log2_HOLOTC = log2(HOLOTC),
log2_HCY = log2(HCY),
log2_MMA = log2(MMA),
log2_VIT_D = log2(VIT_D),
log2_IGF1 = log2(IGF1),
log2_uCa = log2(uCa),
log2_uCCR = log2(uCCR),
log2_uICR = log2(uICR)
) %>%
mutate(
male = if_else(SEX == "M", 1, 0),
GRP = factor(GRP),
FAM = factor(FAM)
) %>%
select(
## remove irrelevant variables not included to any inference and prediction
-c(SEX, BreastFeed_full, BreastFeed_total, ID, dev_delay, BREAKS, AGE),
## remove information about supplementation
-c(SUP_VEG1:SUP_OTH),
## remove non-log2-transformed version when log2(variable) is included
-c(TG, Se, Zn, FERR, TRFINDEX,
STRF, CTX, UI, CREA, HOLOTC, HCY,
MMA, VIT_D, IGF1, uCa, uCCR, uICR
)
) %>%
select(FAM, GRP, log2_age, male, BiW,
BreastFeed_full_duration, BreastFeed_full_stopped, BreastFeed_total_stopped,
everything()
)2.1.2 Missing values and standardization
Open code
skimr::skim(dat_child_all)| Name | dat_child_all |
| Number of rows | 142 |
| Number of columns | 49 |
| _______________________ | |
| Column type frequency: | |
| factor | 2 |
| numeric | 47 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| FAM | 0 | 1 | FALSE | 95 | R25: 3, R41: 3, R55: 3, R70: 3 |
| GRP | 0 | 1 | FALSE | 3 | VN: 61, OM: 44, VG: 37 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| log2_age | 0 | 1.00 | 1.29 | 1.25 | -2.49 | 0.43 | 1.45 | 2.29 | 3.59 | ▁▃▆▇▅ |
| male | 0 | 1.00 | 0.49 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| BiW | 2 | 0.99 | 3336.26 | 541.50 | 2000.00 | 3047.50 | 3400.00 | 3657.50 | 5000.00 | ▂▅▇▃▁ |
| BreastFeed_full_duration | 0 | 1.00 | 5.64 | 2.61 | 0.00 | 5.50 | 6.00 | 6.00 | 18.00 | ▁▇▁▁▁ |
| BreastFeed_full_stopped | 0 | 1.00 | 0.92 | 0.27 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▁▁▇ |
| BreastFeed_total_stopped | 0 | 1.00 | 0.68 | 0.47 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
| MASS_Perc | 0 | 1.00 | 46.85 | 28.80 | 0.00 | 21.00 | 49.00 | 68.50 | 99.00 | ▇▆▃▇▅ |
| HEIGHT_Perc | 0 | 1.00 | 45.35 | 27.47 | 0.00 | 19.50 | 44.00 | 68.75 | 100.00 | ▇▇▅▇▃ |
| BMI_PERC | 0 | 1.00 | 50.50 | 24.96 | 0.00 | 34.25 | 50.00 | 68.00 | 100.00 | ▃▅▇▅▃ |
| M_per_H_PERC | 0 | 1.00 | 49.98 | 24.85 | 0.00 | 33.00 | 51.00 | 66.75 | 99.00 | ▃▃▇▆▃ |
| GLY | 10 | 0.93 | 4.45 | 0.59 | 3.26 | 4.16 | 4.42 | 4.72 | 7.49 | ▂▇▁▁▁ |
| TC | 10 | 0.93 | 3.78 | 0.66 | 2.21 | 3.30 | 3.87 | 4.21 | 5.56 | ▂▆▇▅▁ |
| HDL | 10 | 0.93 | 1.26 | 0.33 | 0.66 | 1.01 | 1.25 | 1.42 | 2.16 | ▆▇▇▂▂ |
| LDL | 10 | 0.93 | 2.05 | 0.57 | 0.24 | 1.69 | 2.02 | 2.41 | 3.48 | ▁▃▇▆▂ |
| Ca | 9 | 0.94 | 2.57 | 0.12 | 2.35 | 2.48 | 2.56 | 2.65 | 2.88 | ▃▇▆▅▁ |
| P | 9 | 0.94 | 1.66 | 0.18 | 1.30 | 1.54 | 1.67 | 1.79 | 2.10 | ▃▆▇▅▁ |
| Mg | 9 | 0.94 | 0.86 | 0.07 | 0.71 | 0.81 | 0.85 | 0.90 | 1.17 | ▃▇▃▁▁ |
| FE | 9 | 0.94 | 13.37 | 6.94 | 1.60 | 8.30 | 12.60 | 17.80 | 32.70 | ▆▇▇▂▁ |
| TIBC | 10 | 0.93 | 71.66 | 9.12 | 48.00 | 66.00 | 72.00 | 77.00 | 103.00 | ▂▆▇▂▁ |
| TRF | 9 | 0.94 | 2.84 | 0.36 | 1.90 | 2.60 | 2.85 | 3.05 | 4.10 | ▂▆▇▂▁ |
| SATTRF | 9 | 0.94 | 18.90 | 9.89 | 2.10 | 11.60 | 18.10 | 24.90 | 43.70 | ▅▇▆▂▂ |
| HGB | 14 | 0.90 | 121.94 | 9.48 | 99.00 | 115.00 | 121.00 | 127.25 | 157.00 | ▁▇▆▂▁ |
| MCV | 14 | 0.90 | 78.81 | 3.83 | 68.20 | 76.77 | 79.45 | 81.10 | 88.30 | ▂▃▇▇▁ |
| PTH | 10 | 0.93 | 2.88 | 1.25 | 0.70 | 1.90 | 2.80 | 3.60 | 7.80 | ▆▇▃▁▁ |
| P1NP | 11 | 0.92 | 720.87 | 294.64 | 215.00 | 482.00 | 631.00 | 959.50 | 1200.99 | ▂▇▅▂▅ |
| UREA | 9 | 0.94 | 4.21 | 1.24 | 1.80 | 3.40 | 4.20 | 5.10 | 7.70 | ▅▇▇▃▁ |
| UA | 9 | 0.94 | 226.97 | 48.32 | 78.00 | 197.00 | 224.00 | 257.00 | 391.00 | ▁▅▇▃▁ |
| FOLATE | 10 | 0.93 | 16.41 | 4.66 | 4.50 | 13.05 | 16.90 | 19.65 | 24.04 | ▁▅▇▇▆ |
| uPCR | 18 | 0.87 | 3.54 | 2.13 | 0.14 | 1.71 | 3.14 | 4.73 | 9.24 | ▇▇▆▂▂ |
| AL_child | 0 | 1.00 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| log2_TG | 10 | 0.93 | -0.10 | 0.71 | -1.74 | -0.60 | -0.12 | 0.29 | 1.84 | ▂▆▇▃▁ |
| log2_Se | 11 | 0.92 | -0.43 | 0.51 | -1.89 | -0.74 | -0.38 | -0.10 | 1.01 | ▁▃▇▅▁ |
| log2_Zn | 11 | 0.92 | 3.50 | 0.29 | 2.81 | 3.34 | 3.50 | 3.68 | 4.41 | ▂▆▇▂▁ |
| log2_FERR | 9 | 0.94 | 3.92 | 0.90 | 1.63 | 3.38 | 3.93 | 4.44 | 7.06 | ▂▆▇▂▁ |
| log2_TRFINDEX | 9 | 0.94 | 0.52 | 0.54 | -0.76 | 0.18 | 0.46 | 0.77 | 2.44 | ▂▇▆▁▁ |
| log2_STRF | 9 | 0.94 | 0.72 | 0.27 | 0.11 | 0.52 | 0.70 | 0.86 | 1.58 | ▂▇▇▂▁ |
| log2_CTX | 11 | 0.92 | 0.29 | 0.36 | -0.81 | 0.11 | 0.30 | 0.50 | 1.44 | ▁▃▇▃▁ |
| log2_UI | 30 | 0.79 | 7.03 | 1.25 | 2.58 | 6.44 | 7.11 | 7.84 | 9.83 | ▁▁▆▇▂ |
| log2_CREA | 9 | 0.94 | 4.64 | 0.40 | 3.91 | 4.32 | 4.64 | 4.95 | 5.67 | ▇▇▇▆▁ |
| log2_HOLOTC | 11 | 0.92 | 6.76 | 0.81 | 4.98 | 6.28 | 6.76 | 7.17 | 8.95 | ▂▆▇▃▁ |
| log2_HCY | 23 | 0.84 | 3.05 | 0.45 | 1.96 | 2.78 | 3.02 | 3.29 | 4.69 | ▁▇▆▁▁ |
| log2_MMA | 19 | 0.87 | 7.75 | 0.91 | 6.07 | 7.22 | 7.55 | 8.01 | 11.54 | ▂▇▁▁▁ |
| log2_VIT_D | 9 | 0.94 | 6.46 | 0.46 | 5.21 | 6.13 | 6.45 | 6.78 | 7.60 | ▁▅▇▅▁ |
| log2_IGF1 | 20 | 0.86 | 6.56 | 0.83 | 4.62 | 5.99 | 6.53 | 7.10 | 8.31 | ▂▆▇▇▃ |
| log2_uCa | 18 | 0.87 | -0.06 | 1.47 | -2.00 | -1.43 | 0.04 | 1.18 | 2.89 | ▇▅▅▆▂ |
| log2_uCCR | 18 | 0.87 | -1.78 | 1.50 | -5.14 | -3.00 | -1.68 | -0.65 | 1.46 | ▂▆▇▆▂ |
| log2_uICR | 31 | 0.78 | 8.46 | 1.28 | 5.91 | 7.56 | 8.46 | 9.14 | 12.52 | ▃▇▇▂▁ |
Removing features with over 25% NA values: all variables are below this threshold, no will be excluded
2.1.2.1 Multiple imputation
Prepare
Open code
dat_child_all_toimp <- dat_child_all %>% select(-FAM)
init <- mice(dat_child_all_toimp, maxit = 0)
init$method
## GRP log2_age male
## "" "" ""
## BiW BreastFeed_full_duration BreastFeed_full_stopped
## "pmm" "" ""
## BreastFeed_total_stopped MASS_Perc HEIGHT_Perc
## "" "" ""
## BMI_PERC M_per_H_PERC GLY
## "" "" "pmm"
## TC HDL LDL
## "pmm" "pmm" "pmm"
## Ca P Mg
## "pmm" "pmm" "pmm"
## FE TIBC TRF
## "pmm" "pmm" "pmm"
## SATTRF HGB MCV
## "pmm" "pmm" "pmm"
## PTH P1NP UREA
## "pmm" "pmm" "pmm"
## UA FOLATE uPCR
## "pmm" "pmm" "pmm"
## AL_child log2_TG log2_Se
## "" "pmm" "pmm"
## log2_Zn log2_FERR log2_TRFINDEX
## "pmm" "pmm" "pmm"
## log2_STRF log2_CTX log2_UI
## "pmm" "pmm" "pmm"
## log2_CREA log2_HOLOTC log2_HCY
## "pmm" "pmm" "pmm"
## log2_MMA log2_VIT_D log2_IGF1
## "pmm" "pmm" "pmm"
## log2_uCa log2_uCCR log2_uICR
## "pmm" "pmm" "pmm"
data_imputed <- run(
expr = mice(
dat_child_all_toimp,
method = init$method,
m = 1
),
path = "gitignore/run/data_child_all_imputed"
)
dat_child_all_imputed <- data.frame(
FAM = dat_child_all$FAM,
complete(data_imputed, 'all')[1]$`1`
)2.1.2.2 Rescaling
Open code
dat_child_all_imputed <- dat_child_all_imputed %>%
mutate(
across(
where(~ is.numeric(.) && length(unique(.)) > 2),
arm::rescale
)
)
summary(dat_child_all_imputed)
## FAM GRP log2_age male BiW
## R25 : 3 OM:44 Min. :-1.51597 Min. :0.0000 Min. :-1.24365
## R41 : 3 VG:37 1st Qu.:-0.34675 1st Qu.:0.0000 1st Qu.:-0.26830
## R55 : 3 VN:61 Median : 0.06553 Median :0.0000 Median : 0.06022
## R70 : 3 Mean : 0.00000 Mean :0.4859 Mean : 0.00000
## R91 : 3 3rd Qu.: 0.39903 3rd Qu.:1.0000 3rd Qu.: 0.30846
## R11 : 2 Max. : 0.92159 Max. :1.0000 Max. : 1.54041
## (Other):125
## BreastFeed_full_duration BreastFeed_full_stopped BreastFeed_total_stopped
## Min. :-1.08165 Min. :0.0000 Min. :0.0000
## 1st Qu.:-0.02701 1st Qu.:1.0000 1st Qu.:0.0000
## Median : 0.06887 Median :1.0000 Median :1.0000
## Mean : 0.00000 Mean :0.9225 Mean :0.6761
## 3rd Qu.: 0.06887 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. : 2.36990 Max. :1.0000 Max. :1.0000
##
## MASS_Perc HEIGHT_Perc BMI_PERC M_per_H_PERC
## Min. :-0.81336 Min. :-0.82527 Min. :-1.01169 Min. :-1.00579
## 1st Qu.:-0.44880 1st Qu.:-0.47038 1st Qu.:-0.32554 1st Qu.:-0.34169
## Median : 0.03729 Median :-0.02448 Median :-0.01002 Median : 0.02055
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.37581 3rd Qu.: 0.42597 3rd Qu.: 0.35058 3rd Qu.: 0.33751
## Max. : 0.90530 Max. : 0.99471 Max. : 0.99165 Max. : 0.98652
##
## GLY TC HDL LDL
## Min. :-0.96932 Min. :-1.1611 Min. :-0.897027 Min. :-1.5591
## 1st Qu.:-0.24605 1st Qu.:-0.3518 1st Qu.:-0.381160 1st Qu.:-0.3186
## Median :-0.02522 Median : 0.0427 Median :-0.008379 Median :-0.0237
## Mean : 0.00000 Mean : 0.0000 Mean : 0.000000 Mean : 0.0000
## 3rd Qu.: 0.23005 3rd Qu.: 0.3271 3rd Qu.: 0.274030 3rd Qu.: 0.3124
## Max. : 2.45859 Max. : 1.2979 Max. : 1.362247 Max. : 1.2515
##
## Ca P Mg FE
## Min. :-0.95206 Min. :-1.04671 Min. :-1.07088 Min. :-0.85319
## 1st Qu.:-0.40979 1st Qu.:-0.35459 1st Qu.:-0.30075 1st Qu.:-0.36807
## Median :-0.03437 Median : 0.02031 Median :-0.02071 Median :-0.05248
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.34105 3rd Qu.: 0.36637 3rd Qu.: 0.31185 3rd Qu.: 0.33165
## Max. : 1.25874 Max. : 1.26036 Max. : 2.14964 Max. : 1.39025
##
## TIBC TRF SATTRF HGB
## Min. :-1.28220 Min. :-1.29718 Min. :-0.86511 Min. :-1.1969
## 1st Qu.:-0.33960 1st Qu.:-0.38031 1st Qu.:-0.37882 1st Qu.:-0.3516
## Median : 0.02925 Median : 0.03263 Median :-0.04314 Median :-0.0346
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.30246 3rd Qu.: 0.29859 3rd Qu.: 0.29509 3rd Qu.: 0.2824
## Max. : 1.72320 Max. : 1.78238 Max. : 1.25873 Max. : 1.8674
##
## MCV PTH P1NP UREA
## Min. :-1.27228 Min. :-0.86222 Min. :-0.8720 Min. :-0.95114
## 1st Qu.:-0.23179 1st Qu.:-0.37956 1st Qu.:-0.4112 1st Qu.:-0.33321
## Median : 0.06504 Median :-0.03767 Median :-0.1493 Median :-0.01939
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.29939 3rd Qu.: 0.30421 3rd Qu.: 0.5204 3rd Qu.: 0.34505
## Max. : 1.23989 Max. : 1.99353 Max. : 0.7624 Max. : 1.43836
##
## UA FOLATE uPCR AL_child
## Min. :-1.4333036 Min. :-1.31482 Min. :-0.7806 Min. :0.0000
## 1st Qu.:-0.3100253 1st Qu.:-0.36494 1st Qu.:-0.4357 1st Qu.:0.0000
## Median :-0.0000698 Median : 0.05861 Median :-0.1030 Median :0.0000
## Mean : 0.0000000 Mean : 0.00000 Mean : 0.0000 Mean :0.1197
## 3rd Qu.: 0.3074059 3rd Qu.: 0.33510 3rd Qu.: 0.2779 3rd Qu.:0.0000
## Max. : 1.6712097 Max. : 0.81004 Max. : 1.2758 Max. :1.0000
##
## log2_TG log2_Se log2_Zn log2_FERR
## Min. :-1.16637 Min. :-1.44955 Min. :-1.176518 Min. :-1.223479
## 1st Qu.:-0.36102 1st Qu.:-0.30729 1st Qu.:-0.259301 1st Qu.:-0.309078
## Median :-0.03236 Median : 0.04957 Median :-0.009786 Median : 0.002261
## Mean : 0.00000 Mean : 0.00000 Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.27003 3rd Qu.: 0.33111 3rd Qu.: 0.322212 3rd Qu.: 0.291327
## Max. : 1.31588 Max. : 1.42922 Max. : 1.534604 Max. : 1.684594
##
## log2_TRFINDEX log2_STRF log2_CTX log2_UI
## Min. :-1.17406 Min. :-1.13215 Min. :-1.48388 Min. :-1.62194
## 1st Qu.:-0.31618 1st Qu.:-0.38289 1st Qu.:-0.26053 1st Qu.:-0.21541
## Median :-0.05137 Median :-0.03349 Median : 0.03195 Median : 0.05194
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.22979 3rd Qu.: 0.27543 3rd Qu.: 0.31093 3rd Qu.: 0.33609
## Max. : 1.74128 Max. : 1.57692 Max. : 1.58523 Max. : 1.06570
##
## log2_CREA log2_HOLOTC log2_HCY log2_MMA
## Min. :-0.8901 Min. :-1.0930 Min. :-1.209351 Min. :-0.9160
## 1st Qu.:-0.4414 1st Qu.:-0.2816 1st Qu.:-0.311324 1st Qu.:-0.3108
## Median :-0.1205 Median : 0.0110 Median :-0.004265 Median :-0.1061
## Mean : 0.0000 Mean : 0.0000 Mean : 0.000000 Mean : 0.0000
## 3rd Qu.: 0.4168 3rd Qu.: 0.2642 3rd Qu.: 0.236266 3rd Qu.: 0.1607
## Max. : 1.3131 Max. : 1.3796 Max. : 1.691373 Max. : 1.9540
##
## log2_VIT_D log2_IGF1 log2_uCa log2_uCCR
## Min. :-1.381056 Min. :-1.0593846 Min. :-0.67077 Min. :-1.14730
## 1st Qu.:-0.366847 1st Qu.:-0.3340024 1st Qu.:-0.45358 1st Qu.:-0.41602
## Median :-0.009946 Median :-0.0003994 Median : 0.03941 Median : 0.03259
## Mean : 0.000000 Mean : 0.0000000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.342680 3rd Qu.: 0.3380598 3rd Qu.: 0.41488 3rd Qu.: 0.37986
## Max. : 1.233721 Max. : 1.0670245 Max. : 1.05893 Max. : 1.09810
##
## log2_uICR
## Min. :-0.99271
## 1st Qu.:-0.33192
## Median :-0.01509
## Mean : 0.00000
## 3rd Qu.: 0.27115
## Max. : 1.59665
## 2.2 Adults
2.2.1 Wrangle data
Open code
for (i in 1:ncol(dat_adult)) {
names(dat_adult)[i] <- str_replace_all(
names(dat_adult)[i], c("^a" = "")
)
}
dat_adult <- dat_adult %>%
rename(
FOLATE = FOLAT,
TIBC = VKFE,
HOLOTC = VIT_AKTB12,
CTX = Cros,
uCCR = Ca_per_Krea,
uICR = I_per_Krea,
uPCR = P_per_Krea,
uCa = Ur_Ca
) %>%
mutate(
log2_FAT = log2(FAT),
log2_FFM = log2(FFM),
log2_TC = log2(TC),
log2_TG = log2(TG),
log2_Se = log2(Se),
log2_Zn = log2(Zn),
log2_TIBC = log2(TIBC),
log2_FERR = log2(FERR),
log2_TRF = log2(TRF),
log2_TRFINDEX = log2(TRFINDEX),
log2_STRF = log2(STRF),
log2_CTX = log2(CTX),
log2_P1NP = log2(P1NP),
log2_UI = log2(UI),
log2_UREA = log2(UREA),
log2_CREA = log2(CREA),
log2_HOLOTC = log2(HOLOTC),
log2_HCY = log2(HCY),
log2_MMA = log2(MMA),
log2_VIT_D = log2(VIT_D),
log2_uCa = log2(uCa),
log2_uCCR = log2(uCCR),
log2_uICR = log2(uICR)
) %>%
mutate(
male = if_else(SEX == "M", 1, 0),
GRP = factor(GRP),
FAM = factor(FAM)
) %>%
select(
## remove irrelevant variables not included to any inference and prediction
-c(SEX, ID),
## remove information about supplementation
-c(SUP_VEG1:SUP_OTH),
## remove non-log2-transformed version when log2(variable) is included
-c(
FAT, FFM, TC, TG, Se, Zn, TIBC, FERR, TRF, TRFINDEX, STRF,
CTX, P1NP, UI, UREA, CREA, HOLOTC, HCY, MMA, VIT_D,
uCa, uCCR, uICR
)
) %>%
select(
FAM, GRP, AGE, male,
everything()
)
summary(dat_adult)
## FAM GRP AGE male BMI
## R1 : 2 OM:50 Min. :21.54 Min. :0.000 Min. :15.89
## R10 : 2 VG:45 1st Qu.:32.47 1st Qu.:0.000 1st Qu.:21.22
## R11 : 2 VN:92 Median :35.32 Median :0.000 Median :23.49
## R12 : 2 Mean :35.54 Mean :0.492 Mean :23.85
## R13 : 2 3rd Qu.:38.27 3rd Qu.:1.000 3rd Qu.:25.98
## R14 : 2 Max. :56.56 Max. :1.000 Max. :34.76
## (Other):175
## WHR HEIGHT MASS HG
## Min. :0.6480 Min. :1.531 Min. : 46.80 Min. :20.75
## 1st Qu.:0.7368 1st Qu.:1.690 1st Qu.: 62.60 1st Qu.:30.75
## Median :0.7843 Median :1.742 Median : 70.40 Median :39.90
## Mean :0.7890 Mean :1.747 Mean : 73.09 Mean :42.12
## 3rd Qu.:0.8341 3rd Qu.:1.804 3rd Qu.: 82.65 3rd Qu.:52.20
## Max. :1.0000 Max. :1.983 Max. :123.50 Max. :76.25
## NA's :4 NA's :1
## PFAT SBP DBP GLY
## Min. : 2.70 Min. : 82.0 Min. : 46.00 Min. :3.100
## 1st Qu.:16.05 1st Qu.:112.5 1st Qu.: 71.00 1st Qu.:4.195
## Median :21.90 Median :123.0 Median : 77.00 Median :4.460
## Mean :22.47 Mean :124.2 Mean : 76.86 Mean :4.454
## 3rd Qu.:28.65 3rd Qu.:135.0 3rd Qu.: 83.00 3rd Qu.:4.720
## Max. :43.90 Max. :179.0 Max. :104.00 Max. :8.410
##
## HDL LDL Ca P
## Min. :0.800 Min. :1.070 Min. :2.180 Min. :0.660
## 1st Qu.:1.215 1st Qu.:2.020 1st Qu.:2.410 1st Qu.:1.005
## Median :1.430 Median :2.440 Median :2.460 Median :1.100
## Mean :1.488 Mean :2.538 Mean :2.456 Mean :1.108
## 3rd Qu.:1.710 3rd Qu.:2.965 3rd Qu.:2.510 3rd Qu.:1.220
## Max. :3.030 Max. :5.960 Max. :2.630 Max. :1.480
##
## Mg FE SATTRF HGB
## Min. :0.6900 Min. : 2.00 Min. : 1.70 Min. : 40.0
## 1st Qu.:0.7700 1st Qu.:14.75 1st Qu.:20.50 1st Qu.:135.5
## Median :0.8100 Median :18.70 Median :27.30 Median :145.0
## Mean :0.8121 Mean :20.04 Mean :29.27 Mean :144.0
## 3rd Qu.:0.8500 3rd Qu.:24.75 3rd Qu.:36.45 3rd Qu.:155.0
## Max. :1.0700 Max. :46.70 Max. :75.20 Max. :182.0
##
## MCV PTH UA FOLATE
## Min. :57.60 Min. :0.800 Min. :138.0 Min. : 3.05
## 1st Qu.:86.00 1st Qu.:2.400 1st Qu.:247.0 1st Qu.: 8.97
## Median :88.40 Median :3.200 Median :296.0 Median :11.59
## Mean :88.61 Mean :3.295 Mean :301.7 Mean :12.46
## 3rd Qu.:91.05 3rd Qu.:4.000 3rd Qu.:349.5 3rd Qu.:15.27
## Max. :99.50 Max. :7.800 Max. :516.0 Max. :24.01
##
## uPCR AL_adult BREAKS log2_FAT
## Min. :0.1565 Min. :0.0000 Min. :0.0000 Min. :0.6781
## 1st Qu.:0.7921 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.4983
## Median :1.3387 Median :0.0000 Median :0.0000 Median :3.9260
## Mean :1.4656 Mean :0.2193 Mean :0.4652 Mean :3.9009
## 3rd Qu.:1.9878 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:4.3889
## Max. :4.7073 Max. :1.0000 Max. :1.0000 Max. :5.3255
## NA's :1
## log2_FFM log2_TC log2_TG log2_Se
## Min. :5.289 Min. :1.395 Min. :-1.7859 Min. :-1.47393
## 1st Qu.:5.545 1st Qu.:1.935 1st Qu.:-0.7859 1st Qu.:-0.35845
## Median :5.755 Median :2.138 Median :-0.3585 Median :-0.02915
## Mean :5.791 Mean :2.128 Mean :-0.2534 Mean :-0.08565
## 3rd Qu.:6.060 3rd Qu.:2.328 3rd Qu.: 0.1110 3rd Qu.: 0.23265
## Max. :6.461 Max. :3.064 Max. : 1.8520 Max. : 1.23266
##
## log2_Zn log2_TIBC log2_FERR log2_TRF
## Min. :2.926 Min. :5.728 Min. :-0.5146 Min. :1.084
## 1st Qu.:3.524 1st Qu.:6.000 1st Qu.: 3.9681 1st Qu.:1.350
## Median :3.689 Median :6.109 Median : 4.7970 Median :1.444
## Mean :3.698 Mean :6.125 Mean : 4.8328 Mean :1.470
## 3rd Qu.:3.848 3rd Qu.:6.248 3rd Qu.: 5.5759 3rd Qu.:1.587
## Max. :4.733 Max. :6.907 Max. : 8.6949 Max. :2.257
##
## log2_TRFINDEX log2_STRF log2_CTX log2_P1NP
## Min. :-1.43440 Min. :-0.43440 Min. :-4.3219 Min. :4.433
## 1st Qu.:-0.62149 1st Qu.: 0.06003 1st Qu.:-1.8651 1st Qu.:5.221
## Median :-0.25154 Median : 0.28094 Median :-1.3328 Median :5.558
## Mean :-0.24387 Mean : 0.27996 Mean :-1.3881 Mean :5.643
## 3rd Qu.: 0.08406 3rd Qu.: 0.43028 3rd Qu.:-0.8612 3rd Qu.:6.059
## Max. : 1.62293 Max. : 4.15381 Max. : 0.5945 Max. :7.322
## NA's :2 NA's :1
## log2_UI log2_UREA log2_CREA log2_HOLOTC
## Min. :-0.737 Min. :0.848 Min. :5.322 Min. :4.392
## 1st Qu.: 5.600 1st Qu.:1.766 1st Qu.:5.833 1st Qu.:6.033
## Median : 6.642 Median :2.070 Median :6.022 Median :6.416
## Mean : 6.401 Mean :2.063 Mean :6.036 Mean :6.439
## 3rd Qu.: 7.375 3rd Qu.:2.322 3rd Qu.:6.219 3rd Qu.:6.815
## Max. : 9.647 Max. :4.000 Max. :7.285 Max. :9.313
## NA's :18
## log2_HCY log2_MMA log2_VIT_D log2_uCa
## Min. :2.963 Min. : 6.349 Min. :4.907 Min. :-2.0006
## 1st Qu.:3.620 1st Qu.: 7.178 1st Qu.:5.898 1st Qu.:-1.7251
## Median :3.907 Median : 7.548 Median :6.186 Median :-0.4546
## Mean :3.928 Mean : 7.624 Mean :6.166 Mean :-0.3721
## 3rd Qu.:4.154 3rd Qu.: 7.889 3rd Qu.:6.438 3rd Qu.: 0.4931
## Max. :6.022 Max. :12.613 Max. :7.314 Max. : 3.0548
## NA's :24 NA's :1
## log2_uCCR log2_uICR
## Min. :-6.3997 Min. : 1.481
## 1st Qu.:-3.3701 1st Qu.: 6.223
## Median :-2.7674 Median : 7.170
## Mean :-2.7535 Mean : 7.180
## 3rd Qu.:-2.1385 3rd Qu.: 8.223
## Max. : 0.2224 Max. :11.964
## NA's :1 NA's :192.2.2 Missing vlaues and stadardization
Open code
skimr::skim(dat_adult)| Name | dat_adult |
| Number of rows | 187 |
| Number of columns | 51 |
| _______________________ | |
| Column type frequency: | |
| factor | 2 |
| numeric | 49 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| FAM | 0 | 1 | FALSE | 95 | R1: 2, R10: 2, R11: 2, R12: 2 |
| GRP | 0 | 1 | FALSE | 3 | VN: 92, OM: 50, VG: 45 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| AGE | 0 | 1.00 | 35.54 | 5.35 | 21.54 | 32.47 | 35.32 | 38.27 | 56.56 | ▂▇▇▁▁ |
| male | 0 | 1.00 | 0.49 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| BMI | 0 | 1.00 | 23.85 | 3.59 | 15.89 | 21.22 | 23.49 | 25.98 | 34.76 | ▂▇▇▂▁ |
| WHR | 4 | 0.98 | 0.79 | 0.07 | 0.65 | 0.74 | 0.78 | 0.83 | 1.00 | ▃▇▆▂▁ |
| HEIGHT | 0 | 1.00 | 1.75 | 0.09 | 1.53 | 1.69 | 1.74 | 1.80 | 1.98 | ▂▅▇▃▂ |
| MASS | 0 | 1.00 | 73.09 | 13.97 | 46.80 | 62.60 | 70.40 | 82.65 | 123.50 | ▅▇▆▁▁ |
| HG | 1 | 0.99 | 42.12 | 12.98 | 20.75 | 30.75 | 39.90 | 52.20 | 76.25 | ▇▆▇▅▁ |
| PFAT | 0 | 1.00 | 22.47 | 8.28 | 2.70 | 16.05 | 21.90 | 28.65 | 43.90 | ▁▆▇▅▂ |
| SBP | 0 | 1.00 | 124.20 | 16.90 | 82.00 | 112.50 | 123.00 | 135.00 | 179.00 | ▂▇▇▃▁ |
| DBP | 0 | 1.00 | 76.86 | 9.30 | 46.00 | 71.00 | 77.00 | 83.00 | 104.00 | ▁▂▇▅▁ |
| GLY | 0 | 1.00 | 4.45 | 0.49 | 3.10 | 4.20 | 4.46 | 4.72 | 8.41 | ▂▇▁▁▁ |
| HDL | 0 | 1.00 | 1.49 | 0.40 | 0.80 | 1.21 | 1.43 | 1.71 | 3.03 | ▅▇▃▁▁ |
| LDL | 0 | 1.00 | 2.54 | 0.71 | 1.07 | 2.02 | 2.44 | 2.96 | 5.96 | ▅▇▃▁▁ |
| Ca | 0 | 1.00 | 2.46 | 0.08 | 2.18 | 2.41 | 2.46 | 2.51 | 2.63 | ▁▂▇▇▃ |
| P | 0 | 1.00 | 1.11 | 0.16 | 0.66 | 1.00 | 1.10 | 1.22 | 1.48 | ▁▃▇▆▂ |
| Mg | 0 | 1.00 | 0.81 | 0.07 | 0.69 | 0.77 | 0.81 | 0.85 | 1.07 | ▃▇▃▁▁ |
| FE | 0 | 1.00 | 20.04 | 8.00 | 2.00 | 14.75 | 18.70 | 24.75 | 46.70 | ▂▇▆▂▁ |
| SATTRF | 0 | 1.00 | 29.27 | 12.61 | 1.70 | 20.50 | 27.30 | 36.45 | 75.20 | ▂▇▅▂▁ |
| HGB | 0 | 1.00 | 143.96 | 14.98 | 40.00 | 135.50 | 145.00 | 155.00 | 182.00 | ▁▁▁▇▃ |
| MCV | 0 | 1.00 | 88.61 | 4.24 | 57.60 | 86.00 | 88.40 | 91.05 | 99.50 | ▁▁▁▇▃ |
| PTH | 0 | 1.00 | 3.29 | 1.18 | 0.80 | 2.40 | 3.20 | 4.00 | 7.80 | ▃▇▅▁▁ |
| UA | 0 | 1.00 | 301.66 | 70.78 | 138.00 | 247.00 | 296.00 | 349.50 | 516.00 | ▂▇▇▃▁ |
| FOLATE | 0 | 1.00 | 12.46 | 4.87 | 3.05 | 8.97 | 11.59 | 15.27 | 24.01 | ▂▇▆▂▂ |
| uPCR | 1 | 0.99 | 1.47 | 0.83 | 0.16 | 0.79 | 1.34 | 1.99 | 4.71 | ▇▇▅▁▁ |
| AL_adult | 0 | 1.00 | 0.22 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| BREAKS | 0 | 1.00 | 0.47 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| log2_FAT | 0 | 1.00 | 3.90 | 0.69 | 0.68 | 3.50 | 3.93 | 4.39 | 5.33 | ▁▁▃▇▃ |
| log2_FFM | 0 | 1.00 | 5.79 | 0.29 | 5.29 | 5.55 | 5.75 | 6.06 | 6.46 | ▇▇▅▇▂ |
| log2_TC | 0 | 1.00 | 2.13 | 0.29 | 1.40 | 1.94 | 2.14 | 2.33 | 3.06 | ▂▇▇▃▁ |
| log2_TG | 0 | 1.00 | -0.25 | 0.71 | -1.79 | -0.79 | -0.36 | 0.11 | 1.85 | ▂▇▆▂▁ |
| log2_Se | 0 | 1.00 | -0.09 | 0.49 | -1.47 | -0.36 | -0.03 | 0.23 | 1.23 | ▁▃▇▅▁ |
| log2_Zn | 0 | 1.00 | 3.70 | 0.27 | 2.93 | 3.52 | 3.69 | 3.85 | 4.73 | ▁▆▇▂▁ |
| log2_TIBC | 0 | 1.00 | 6.13 | 0.20 | 5.73 | 6.00 | 6.11 | 6.25 | 6.91 | ▃▇▃▁▁ |
| log2_FERR | 0 | 1.00 | 4.83 | 1.24 | -0.51 | 3.97 | 4.80 | 5.58 | 8.69 | ▁▁▇▆▁ |
| log2_TRF | 0 | 1.00 | 1.47 | 0.20 | 1.08 | 1.35 | 1.44 | 1.59 | 2.26 | ▃▇▃▁▁ |
| log2_TRFINDEX | 2 | 0.99 | -0.24 | 0.53 | -1.43 | -0.62 | -0.25 | 0.08 | 1.62 | ▂▇▆▂▁ |
| log2_STRF | 1 | 0.99 | 0.28 | 0.41 | -0.43 | 0.06 | 0.28 | 0.43 | 4.15 | ▇▂▁▁▁ |
| log2_CTX | 0 | 1.00 | -1.39 | 0.79 | -4.32 | -1.87 | -1.33 | -0.86 | 0.59 | ▁▂▇▇▂ |
| log2_P1NP | 0 | 1.00 | 5.64 | 0.59 | 4.43 | 5.22 | 5.56 | 6.06 | 7.32 | ▃▇▆▃▁ |
| log2_UI | 18 | 0.90 | 6.40 | 1.59 | -0.74 | 5.60 | 6.64 | 7.38 | 9.65 | ▁▁▂▇▂ |
| log2_UREA | 0 | 1.00 | 2.06 | 0.42 | 0.85 | 1.77 | 2.07 | 2.32 | 4.00 | ▁▇▆▁▁ |
| log2_CREA | 0 | 1.00 | 6.04 | 0.29 | 5.32 | 5.83 | 6.02 | 6.22 | 7.29 | ▂▇▆▁▁ |
| log2_HOLOTC | 0 | 1.00 | 6.44 | 0.69 | 4.39 | 6.03 | 6.42 | 6.82 | 9.31 | ▁▇▇▁▁ |
| log2_HCY | 24 | 0.87 | 3.93 | 0.51 | 2.96 | 3.62 | 3.91 | 4.15 | 6.02 | ▃▇▃▁▁ |
| log2_MMA | 0 | 1.00 | 7.62 | 0.68 | 6.35 | 7.18 | 7.55 | 7.89 | 12.61 | ▇▆▁▁▁ |
| log2_VIT_D | 0 | 1.00 | 6.17 | 0.44 | 4.91 | 5.90 | 6.19 | 6.44 | 7.31 | ▁▃▇▆▁ |
| log2_uCa | 1 | 0.99 | -0.37 | 1.31 | -2.00 | -1.73 | -0.45 | 0.49 | 3.05 | ▇▆▅▂▁ |
| log2_uCCR | 1 | 0.99 | -2.75 | 1.01 | -6.40 | -3.37 | -2.77 | -2.14 | 0.22 | ▁▂▇▆▁ |
| log2_uICR | 19 | 0.90 | 7.18 | 1.64 | 1.48 | 6.22 | 7.17 | 8.22 | 11.96 | ▁▂▇▅▁ |
All features has over 3/4 of non-missing values
2.2.2.1 Multiple imputation
Prepare
Open code
dat_adult_toimp <- dat_adult %>% select(-FAM)
init <- mice(dat_adult_toimp, maxit = 0)
## Warning: Number of logged events: 1
init$method
## GRP AGE male BMI WHR
## "" "" "" "" "pmm"
## HEIGHT MASS HG PFAT SBP
## "" "" "pmm" "" ""
## DBP GLY HDL LDL Ca
## "" "" "" "" ""
## P Mg FE SATTRF HGB
## "" "" "" "" ""
## MCV PTH UA FOLATE uPCR
## "" "" "" "" "pmm"
## AL_adult BREAKS log2_FAT log2_FFM log2_TC
## "" "" "" "" ""
## log2_TG log2_Se log2_Zn log2_TIBC log2_FERR
## "" "" "" "" ""
## log2_TRF log2_TRFINDEX log2_STRF log2_CTX log2_P1NP
## "" "pmm" "pmm" "" ""
## log2_UI log2_UREA log2_CREA log2_HOLOTC log2_HCY
## "pmm" "" "" "" "pmm"
## log2_MMA log2_VIT_D log2_uCa log2_uCCR log2_uICR
## "" "" "pmm" "pmm" "pmm"
data_imputed <- run(
expr = mice(
dat_adult_toimp,
method = init$method,
m = 1
),
path = "gitignore/run/data_adult_imputed"
)
dat_adult_imputed <- data.frame(
FAM = dat_adult$FAM,
complete(data_imputed, 'all')[1]$`1`
)2.2.2.2 Rescaling
Open code
dat_adult_imputed <- dat_adult_imputed %>%
mutate(
across(
where(~ is.numeric(.) && length(unique(.)) > 2),
arm::rescale
)
)
summary(dat_adult_imputed)
## FAM GRP AGE male BMI
## R1 : 2 OM:50 Min. :-1.30803 Min. :0.000 Min. :-1.10918
## R10 : 2 VG:45 1st Qu.:-0.28651 1st Qu.:0.000 1st Qu.:-0.36567
## R11 : 2 VN:92 Median :-0.02015 Median :0.000 Median :-0.04986
## R12 : 2 Mean : 0.00000 Mean :0.492 Mean : 0.00000
## R13 : 2 3rd Qu.: 0.25504 3rd Qu.:1.000 3rd Qu.: 0.29673
## R14 : 2 Max. : 1.96479 Max. :1.000 Max. : 1.51954
## (Other):175
## WHR HEIGHT MASS HG
## Min. :-1.02613 Min. :-1.19707 Min. :-0.9408 Min. :-0.8219
## 1st Qu.:-0.37285 1st Qu.:-0.31572 1st Qu.:-0.3753 1st Qu.:-0.4363
## Median :-0.03565 Median :-0.02748 Median :-0.0961 Median :-0.1202
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.32857 3rd Qu.: 0.31619 3rd Qu.: 0.3423 3rd Qu.: 0.3887
## Max. : 1.54683 Max. : 1.30841 Max. : 1.8044 Max. : 1.3178
##
## PFAT SBP DBP GLY
## Min. :-1.19376 Min. :-1.2486 Min. :-1.659628 Min. :-1.368237
## 1st Qu.:-0.38753 1st Qu.:-0.3462 1st Qu.:-0.314955 1st Qu.:-0.262125
## Median :-0.03423 Median :-0.0356 Median : 0.007766 Median : 0.005564
## Mean : 0.00000 Mean : 0.0000 Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.37341 3rd Qu.: 0.3194 3rd Qu.: 0.330487 3rd Qu.: 0.268202
## Max. : 1.29439 Max. : 1.6211 Max. : 1.460013 Max. : 3.995648
##
## HDL LDL Ca P
## Min. :-0.85527 Min. :-1.02864 Min. :-1.7722 Min. :-1.415
## 1st Qu.:-0.33963 1st Qu.:-0.36317 1st Qu.:-0.2950 1st Qu.:-0.325
## Median :-0.07249 Median :-0.06896 Median : 0.0261 Median :-0.025
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000 Mean : 0.000
## 3rd Qu.: 0.27541 3rd Qu.: 0.29880 3rd Qu.: 0.3472 3rd Qu.: 0.354
## Max. : 1.91552 Max. : 2.39677 Max. : 1.1179 Max. : 1.175
##
## Mg FE SATTRF HGB
## Min. :-0.91127 Min. :-1.12804 Min. :-1.09328 Min. :-3.4692
## 1st Qu.:-0.31440 1st Qu.:-0.33095 1st Qu.:-0.34777 1st Qu.:-0.2822
## Median :-0.01596 Median :-0.08401 Median :-0.07812 Median : 0.0348
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.28248 3rd Qu.: 0.29421 3rd Qu.: 0.28472 3rd Qu.: 0.3685
## Max. : 1.92389 Max. : 1.66645 Max. : 1.82133 Max. : 1.2696
##
## MCV PTH UA FOLATE
## Min. :-3.65974 Min. :-1.06081 Min. :-1.15606 Min. :-0.96640
## 1st Qu.:-0.30764 1st Qu.:-0.38044 1st Qu.:-0.38610 1st Qu.:-0.35870
## Median :-0.02436 Median :-0.04025 Median :-0.03997 Median :-0.08975
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.28842 3rd Qu.: 0.29994 3rd Qu.: 0.33795 3rd Qu.: 0.28749
## Max. : 1.28579 Max. : 1.91583 Max. : 1.51409 Max. : 1.18518
##
## uPCR AL_adult BREAKS log2_FAT
## Min. :-0.79306 Min. :0.0000 Min. :0.0000 Min. :-2.3394
## 1st Qu.:-0.40597 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:-0.2923
## Median :-0.08154 Median :0.0000 Median :0.0000 Median : 0.0182
## Mean : 0.00000 Mean :0.2193 Mean :0.4652 Mean : 0.0000
## 3rd Qu.: 0.31714 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.: 0.3542
## Max. : 1.96700 Max. :1.0000 Max. :1.0000 Max. : 1.0341
##
## log2_FFM log2_TC log2_TG log2_Se
## Min. :-0.85542 Min. :-1.27470 Min. :-1.07569 Min. :-1.42194
## 1st Qu.:-0.41825 1st Qu.:-0.33509 1st Qu.:-0.37376 1st Qu.:-0.27942
## Median :-0.06078 Median : 0.01621 Median :-0.07373 Median : 0.05788
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.45908 3rd Qu.: 0.34690 3rd Qu.: 0.25581 3rd Qu.: 0.32602
## Max. : 1.14396 Max. : 1.62630 Max. : 1.47786 Max. : 1.35028
##
## log2_Zn log2_TIBC log2_FERR log2_TRF
## Min. :-1.41578 Min. :-0.99967 Min. :-2.15715 Min. :-0.97211
## 1st Qu.:-0.31993 1st Qu.:-0.31549 1st Qu.:-0.34884 1st Qu.:-0.30136
## Median :-0.01599 Median :-0.04259 Median :-0.01445 Median :-0.06695
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.27504 3rd Qu.: 0.30795 3rd Qu.: 0.29976 3rd Qu.: 0.29496
## Max. : 1.89867 Max. : 1.96499 Max. : 1.55796 Max. : 1.98082
##
## log2_TRFINDEX log2_STRF log2_CTX log2_P1NP
## Min. :-1.10778 Min. :-0.877703 Min. :-1.8463 Min. :-1.02184
## 1st Qu.:-0.34728 1st Qu.:-0.266816 1st Qu.:-0.3002 1st Qu.:-0.35636
## Median :-0.01598 Median : 0.007279 Median : 0.0348 Median :-0.07218
## Mean : 0.00000 Mean : 0.000000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.29378 3rd Qu.: 0.186510 3rd Qu.: 0.3316 3rd Qu.: 0.35073
## Max. : 1.71417 Max. : 4.751818 Max. : 1.2477 Max. : 1.41750
##
## log2_UI log2_UREA log2_CREA log2_HOLOTC
## Min. :-2.24120 Min. :-1.432231 Min. :-1.21223 Min. :-1.47624
## 1st Qu.:-0.25347 1st Qu.:-0.350970 1st Qu.:-0.34506 1st Qu.:-0.29262
## Median : 0.09545 Median : 0.008282 Median :-0.02349 Median :-0.01657
## Mean : 0.00000 Mean : 0.000000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.29693 3rd Qu.: 0.304705 3rd Qu.: 0.31045 3rd Qu.: 0.27113
## Max. : 0.99762 Max. : 2.282208 Max. : 2.12003 Max. : 2.07264
##
## log2_HCY log2_MMA log2_VIT_D log2_uCa
## Min. :-0.958835 Min. :-0.94318 Min. :-1.43672 Min. :-0.62121
## 1st Qu.:-0.307167 1st Qu.:-0.32979 1st Qu.:-0.30517 1st Qu.:-0.51130
## Median :-0.004007 Median :-0.05622 Median : 0.02283 Median :-0.04482
## Mean : 0.000000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.249839 3rd Qu.: 0.19536 3rd Qu.: 0.31051 3rd Qu.: 0.33192
## Max. : 2.075618 Max. : 3.68813 Max. : 1.30999 Max. : 1.31305
##
## log2_uCCR log2_uICR
## Min. :-1.804807 Min. :-1.78656
## 1st Qu.:-0.304338 1st Qu.:-0.30019
## Median :-0.006875 Median :-0.01302
## Mean : 0.000000 Mean : 0.00000
## 3rd Qu.: 0.303717 3rd Qu.: 0.33061
## Max. : 1.472229 Max. : 1.47788
## 3 Diet prediction
(OM, VN, VG determination)
All prediction models will account for the variable FAM as the grouping (block) variable, ensuring that all subject of a single family will be part of the same fold during each bootstrap iteration.
3.1 All children
3.1.1 Baseline model
Following models will be trained to discriminate between two diets, based solely on log2_age, male, BreastFeed_total_stopped, BreastFeed_full_stopped and interaction BreastFeed_full_duration.
3.1.1.1 Prepare data
Open code
dat_pred <- dat_child_all_imputed %>%
select(GRP, FAM, log2_age, male, BreastFeed_full_duration,
BreastFeed_full_stopped, BreastFeed_total_stopped, BiW)
nfam <- dat_pred$FAM %>% factor() %>% levels() %>% length()
dat_features <- dat_pred %>% select(-FAM, -GRP) %>% as.matrix
colnames(dat_features)
## [1] "log2_age" "male"
## [3] "BreastFeed_full_duration" "BreastFeed_full_stopped"
## [5] "BreastFeed_total_stopped" "BiW"
dat_outcome <- dat_pred %>% select(GRP) %>% as.matrix3.1.1.2 VN vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VG') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.5809524
## Fit and validate model
modelac <- 'pred_baseline_child_all_VN_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.2000000
## lambda 0.4039723
## auc 0.6350596
## auc_optimism_corrected 0.5372320
## auc_optimism_corrected_CIL 0.3586458
## auc_optimism_corrected_CIU 0.6872924
## accuracy 0.5809524
## accuracy_optimism_corrected 0.5327153
## accuracy_optimism_corrected_CIL 0.3468919
## accuracy_optimism_corrected_CIU 0.6775735
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism
## Min. :0.5000 Min. :0.2742 Min. :-0.134138
## 1st Qu.:0.6168 1st Qu.:0.5000 1st Qu.: 0.001095
## Median :0.6739 Median :0.5284 Median : 0.108890
## Mean :0.6582 Mean :0.5372 Mean : 0.120975
## 3rd Qu.:0.7238 3rd Qu.:0.5890 3rd Qu.: 0.202375
## Max. :0.8917 Max. :0.7717 Max. : 0.527510
## accuracy_resamp_test accuracy_resamp_validation accuracy_optimism
## Min. :0.5049 Min. :0.2000 Min. :-0.22660
## 1st Qu.:0.5941 1st Qu.:0.4773 1st Qu.: 0.01389
## Median :0.6281 Median :0.5366 Median : 0.09871
## Mean :0.6304 Mean :0.5327 Mean : 0.09765
## 3rd Qu.:0.6667 3rd Qu.:0.5952 3rd Qu.: 0.17749
## Max. :0.8317 Max. :0.7576 Max. : 0.44119
## Model coefficients
get(modelac)$betas
## 6 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age -0.06622559
## male .
## BreastFeed_full_duration .
## BreastFeed_full_stopped .
## BreastFeed_total_stopped -0.07857001
## BiW .3.1.1.3 VG vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VN') %>%
mutate(outcome = if_else(GRP == 'VG', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.4567901
## Fit and validate model
modelac <- 'pred_baseline_child_all_VG_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.0000000
## lambda 0.1504131
## auc 0.7542998
## auc_optimism_corrected 0.5749206
## auc_optimism_corrected_CIL 0.3285965
## auc_optimism_corrected_CIU 0.7837943
## accuracy 0.6790123
## accuracy_optimism_corrected 0.5097507
## accuracy_optimism_corrected_CIL 0.2815476
## accuracy_optimism_corrected_CIU 0.7030882
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.1732 Min. :-0.1778 Min. :0.5065
## 1st Qu.:0.7422 1st Qu.:0.5000 1st Qu.: 0.1088 1st Qu.:0.6657
## Median :0.7898 Median :0.5825 Median : 0.1972 Median :0.7089
## Mean :0.7718 Mean :0.5749 Mean : 0.1968 Mean :0.7024
## 3rd Qu.:0.8227 3rd Qu.:0.6597 3rd Qu.: 0.2802 3rd Qu.:0.7468
## Max. :0.9410 Max. :0.8681 Max. : 0.5914 Max. :0.8831
## accuracy_resamp_validation accuracy_optimism
## Min. :0.1765 Min. :-0.1658
## 1st Qu.:0.4409 1st Qu.: 0.1102
## Median :0.5172 Median : 0.1903
## Mean :0.5098 Mean : 0.1927
## 3rd Qu.:0.5862 3rd Qu.: 0.2778
## Max. :0.8182 Max. : 0.5610
## Model coefficients
get(modelac)$betas
## 6 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age 0.39328720
## male 0.19015857
## BreastFeed_full_duration 0.71146159
## BreastFeed_full_stopped -1.05623482
## BreastFeed_total_stopped -0.58807096
## BiW -0.067137963.1.1.4 VN vs VG
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'OM') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.622449
## Fit and validate model
modelac <- 'pred_baseline_child_all_VN_VG'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.8000000
## lambda 0.1067586
## auc 0.6637129
## auc_optimism_corrected 0.5923943
## auc_optimism_corrected_CIL 0.3866566
## auc_optimism_corrected_CIU 0.7576386
## accuracy 0.6224490
## accuracy_optimism_corrected 0.6002464
## accuracy_optimism_corrected_CIL 0.4475267
## accuracy_optimism_corrected_CIU 0.7567568
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.1781 Min. :-0.15986 Min. :0.2400
## 1st Qu.:0.6841 1st Qu.:0.5235 1st Qu.: 0.03207 1st Qu.:0.6504
## Median :0.7391 Median :0.6029 Median : 0.12478 Median :0.6848
## Mean :0.7221 Mean :0.5924 Mean : 0.12972 Mean :0.6866
## 3rd Qu.:0.7848 3rd Qu.:0.6619 3rd Qu.: 0.20828 3rd Qu.:0.7263
## Max. :0.8872 Max. :0.8733 Max. : 0.55336 Max. :0.8416
## accuracy_resamp_validation accuracy_optimism
## Min. :0.3889 Min. :-0.29846
## 1st Qu.:0.5526 1st Qu.: 0.01585
## Median :0.6000 Median : 0.08727
## Mean :0.6002 Mean : 0.08634
## 3rd Qu.:0.6471 3rd Qu.: 0.16024
## Max. :0.8667 Max. : 0.41571
## Model coefficients
get(modelac)$betas
## 6 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age -0.29889
## male .
## BreastFeed_full_duration .
## BreastFeed_full_stopped .
## BreastFeed_total_stopped .
## BiW .3.1.2 Complete model
Baseline + clinical characteristics
3.1.2.1 Prepare data
Open code
dat_pred <- dat_child_all_imputed %>%
select(GRP, FAM, log2_age, male, BreastFeed_full_duration,
BreastFeed_full_stopped, BreastFeed_total_stopped,
BiW,
MASS_Perc:log2_uICR)
dat_features <- dat_pred %>% select(-FAM, -GRP) %>% as.matrix
colnames(dat_features)
## [1] "log2_age" "male"
## [3] "BreastFeed_full_duration" "BreastFeed_full_stopped"
## [5] "BreastFeed_total_stopped" "BiW"
## [7] "MASS_Perc" "HEIGHT_Perc"
## [9] "BMI_PERC" "M_per_H_PERC"
## [11] "GLY" "TC"
## [13] "HDL" "LDL"
## [15] "Ca" "P"
## [17] "Mg" "FE"
## [19] "TIBC" "TRF"
## [21] "SATTRF" "HGB"
## [23] "MCV" "PTH"
## [25] "P1NP" "UREA"
## [27] "UA" "FOLATE"
## [29] "uPCR" "AL_child"
## [31] "log2_TG" "log2_Se"
## [33] "log2_Zn" "log2_FERR"
## [35] "log2_TRFINDEX" "log2_STRF"
## [37] "log2_CTX" "log2_UI"
## [39] "log2_CREA" "log2_HOLOTC"
## [41] "log2_HCY" "log2_MMA"
## [43] "log2_VIT_D" "log2_IGF1"
## [45] "log2_uCa" "log2_uCCR"
## [47] "log2_uICR"3.1.2.2 VN vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VG') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.5809524
## Fit and validate model
modelac <- 'pred_complet_child_all_VN_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
get(modelac)$plot
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'Open code
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.2000000
## lambda 0.0999577
## auc 0.9828614
## auc_optimism_corrected 0.8646881
## auc_optimism_corrected_CIL 0.7430309
## auc_optimism_corrected_CIU 0.9522917
## accuracy 0.9333333
## accuracy_optimism_corrected 0.7781709
## accuracy_optimism_corrected_CIL 0.6363991
## accuracy_optimism_corrected_CIU 0.9004756
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.9551 Min. :0.6565 Min. :-0.01226 Min. :0.8796
## 1st Qu.:0.9943 1st Qu.:0.8333 1st Qu.: 0.09301 1st Qu.:0.9623
## Median :0.9990 Median :0.8721 Median : 0.12419 Median :0.9811
## Mean :0.9957 Mean :0.8647 Mean : 0.13099 Mean :0.9756
## 3rd Qu.:1.0000 3rd Qu.:0.9033 3rd Qu.: 0.16485 3rd Qu.:1.0000
## Max. :1.0000 Max. :0.9931 Max. : 0.34345 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.5208 Min. :-0.08761
## 1st Qu.:0.7368 1st Qu.: 0.14786
## Median :0.7838 Median : 0.19982
## Mean :0.7782 Mean : 0.19743
## 3rd Qu.:0.8235 3rd Qu.: 0.24390
## Max. :0.9722 Max. : 0.47917
## Model coefficients
get(modelac)$betas
## 47 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age -0.023259913
## male 0.034639697
## BreastFeed_full_duration .
## BreastFeed_full_stopped .
## BreastFeed_total_stopped -0.187358974
## BiW -0.225673112
## MASS_Perc -0.124980706
## HEIGHT_Perc -0.352304701
## BMI_PERC .
## M_per_H_PERC .
## GLY 0.008439785
## TC .
## HDL 0.037182485
## LDL -0.146417229
## Ca .
## P .
## Mg 0.271171726
## FE .
## TIBC -0.213925698
## TRF -0.321646059
## SATTRF 0.116861635
## HGB -0.139453747
## MCV 0.652158784
## PTH 0.129721503
## P1NP 0.081592097
## UREA -0.258241015
## UA -0.029756305
## FOLATE 0.856689101
## uPCR -1.103703323
## AL_child -0.140009893
## log2_TG .
## log2_Se -0.027280307
## log2_Zn -0.155872826
## log2_FERR -0.665202312
## log2_TRFINDEX 0.250201535
## log2_STRF .
## log2_CTX .
## log2_UI -0.084845177
## log2_CREA -0.372434731
## log2_HOLOTC 0.484119828
## log2_HCY .
## log2_MMA -0.953253014
## log2_VIT_D 0.392936461
## log2_IGF1 -0.355548896
## log2_uCa -0.023014066
## log2_uCCR .
## log2_uICR .3.1.2.3 VG vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VN') %>%
mutate(outcome = if_else(GRP == 'VG', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.4567901
## Fit and validate model
modelac <- 'pred_complet_child_all_VG_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.6000000
## lambda 0.1728403
## auc 0.7979115
## auc_optimism_corrected 0.5908125
## auc_optimism_corrected_CIL 0.3239306
## auc_optimism_corrected_CIU 0.8460016
## accuracy 0.7160494
## accuracy_optimism_corrected 0.5438350
## accuracy_optimism_corrected_CIL 0.3214286
## accuracy_optimism_corrected_CIU 0.7500000
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.2173 Min. :-0.1640 Min. :0.5658
## 1st Qu.:0.9085 1st Qu.:0.5048 1st Qu.: 0.2347 1st Qu.:0.8292
## Median :0.9500 Median :0.5887 Median : 0.3690 Median :0.8902
## Mean :0.9385 Mean :0.5908 Mean : 0.3477 Mean :0.8790
## 3rd Qu.:0.9845 3rd Qu.:0.6815 3rd Qu.: 0.4635 3rd Qu.:0.9452
## Max. :1.0000 Max. :0.9667 Max. : 0.7811 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.1429 Min. :-0.1357
## 1st Qu.:0.4682 1st Qu.: 0.2278
## Median :0.5484 Median : 0.3466
## Mean :0.5438 Mean : 0.3352
## 3rd Qu.:0.6167 3rd Qu.: 0.4450
## Max. :0.8750 Max. : 0.8275
## Model coefficients
get(modelac)$betas
## 47 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age .
## male .
## BreastFeed_full_duration .
## BreastFeed_full_stopped .
## BreastFeed_total_stopped .
## BiW .
## MASS_Perc .
## HEIGHT_Perc .
## BMI_PERC .
## M_per_H_PERC .
## GLY .
## TC .
## HDL .
## LDL .
## Ca .
## P .
## Mg .
## FE .
## TIBC .
## TRF .
## SATTRF .
## HGB .
## MCV .
## PTH .
## P1NP .
## UREA .
## UA .
## FOLATE 0.90209325
## uPCR -0.04462848
## AL_child .
## log2_TG .
## log2_Se .
## log2_Zn .
## log2_FERR .
## log2_TRFINDEX .
## log2_STRF .
## log2_CTX .
## log2_UI .
## log2_CREA .
## log2_HOLOTC .
## log2_HCY .
## log2_MMA .
## log2_VIT_D .
## log2_IGF1 .
## log2_uCa .
## log2_uCCR .
## log2_uICR .3.1.2.4 VN vs VG
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'OM') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.622449
## Fit and validate model
modelac <- 'pred_complet_child_all_VN_VG'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 1.00000000
## lambda 0.05240856
## auc 0.92822331
## auc_optimism_corrected 0.70426316
## auc_optimism_corrected_CIL 0.52312653
## auc_optimism_corrected_CIU 0.87598815
## accuracy 0.80612245
## accuracy_optimism_corrected 0.66692062
## accuracy_optimism_corrected_CIL 0.50000000
## accuracy_optimism_corrected_CIU 0.81422697
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism
## Min. :0.7897 Min. :0.3720 Min. :-0.002957
## 1st Qu.:0.9763 1st Qu.:0.6477 1st Qu.: 0.221844
## Median :0.9915 Median :0.7097 Median : 0.275220
## Mean :0.9836 Mean :0.7043 Mean : 0.279333
## 3rd Qu.:0.9991 3rd Qu.:0.7622 3rd Qu.: 0.340465
## Max. :1.0000 Max. :0.9592 Max. : 0.607497
## accuracy_resamp_test accuracy_resamp_validation accuracy_optimism
## Min. :0.6383 Min. :0.3889 Min. :-0.04508
## 1st Qu.:0.9164 1st Qu.:0.6130 1st Qu.: 0.21897
## Median :0.9500 Median :0.6667 Median : 0.27420
## Mean :0.9419 Mean :0.6669 Mean : 0.27501
## 3rd Qu.:0.9796 3rd Qu.:0.7227 3rd Qu.: 0.33346
## Max. :1.0000 Max. :0.9286 Max. : 0.54862
## Model coefficients
get(modelac)$betas
## 47 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age -0.37604128
## male .
## BreastFeed_full_duration .
## BreastFeed_full_stopped .
## BreastFeed_total_stopped .
## BiW .
## MASS_Perc .
## HEIGHT_Perc -0.42563039
## BMI_PERC .
## M_per_H_PERC .
## GLY .
## TC .
## HDL 0.02568811
## LDL -0.01840456
## Ca .
## P .
## Mg .
## FE .
## TIBC .
## TRF .
## SATTRF .
## HGB .
## MCV 0.52833146
## PTH 0.47085025
## P1NP .
## UREA .
## UA .
## FOLATE .
## uPCR -0.79802410
## AL_child .
## log2_TG 0.02731084
## log2_Se .
## log2_Zn .
## log2_FERR -0.06851785
## log2_TRFINDEX 0.30304989
## log2_STRF .
## log2_CTX .
## log2_UI .
## log2_CREA .
## log2_HOLOTC 0.75759266
## log2_HCY .
## log2_MMA -0.49930782
## log2_VIT_D .
## log2_IGF1 -1.00932674
## log2_uCa .
## log2_uCCR .
## log2_uICR .3.1.3 Reduced
Baseline + clinical characteristics not affected with vitamins supplementation
3.1.3.1 Prepare data
Open code
dat_pred <- dat_child_all_imputed %>%
select(GRP, FAM, log2_age, male, BreastFeed_full_duration,
BreastFeed_full_stopped, BreastFeed_total_stopped,
BiW,
MASS_Perc:Mg,
log2_TG:log2_Zn,
HGB:UA,
log2_CTX, log2_CREA, log2_IGF1
)
dat_features <- dat_pred %>% select(-FAM, -GRP) %>% as.matrix
colnames(dat_features)
## [1] "log2_age" "male"
## [3] "BreastFeed_full_duration" "BreastFeed_full_stopped"
## [5] "BreastFeed_total_stopped" "BiW"
## [7] "MASS_Perc" "HEIGHT_Perc"
## [9] "BMI_PERC" "M_per_H_PERC"
## [11] "GLY" "TC"
## [13] "HDL" "LDL"
## [15] "Ca" "P"
## [17] "Mg" "log2_TG"
## [19] "log2_Se" "log2_Zn"
## [21] "HGB" "MCV"
## [23] "PTH" "P1NP"
## [25] "UREA" "UA"
## [27] "log2_CTX" "log2_CREA"
## [29] "log2_IGF1"3.1.3.2 VN vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VG') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.5809524
## Fit and validate model
modelac <- 'pred_complex_child_all_VN_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
get(modelac)$plot
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'Open code
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.0000000
## lambda 0.6809560
## auc 0.8386736
## auc_optimism_corrected 0.6525305
## auc_optimism_corrected_CIL 0.4571037
## auc_optimism_corrected_CIU 0.7993206
## accuracy 0.7809524
## accuracy_optimism_corrected 0.6243409
## accuracy_optimism_corrected_CIL 0.4534091
## accuracy_optimism_corrected_CIU 0.7692308
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.2737 Min. :-0.07665 Min. :0.5701
## 1st Qu.:0.8850 1st Qu.:0.5949 1st Qu.: 0.17870 1st Qu.:0.8076
## Median :0.9131 Median :0.6616 Median : 0.24716 Median :0.8500
## Mean :0.9089 Mean :0.6525 Mean : 0.25639 Mean :0.8454
## 3rd Qu.:0.9421 3rd Qu.:0.7162 3rd Qu.: 0.33753 3rd Qu.:0.8879
## Max. :1.0000 Max. :0.9261 Max. : 0.64381 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.3500 Min. :-0.1079
## 1st Qu.:0.5714 1st Qu.: 0.1400
## Median :0.6250 Median : 0.2216
## Mean :0.6243 Mean : 0.2210
## 3rd Qu.:0.6832 3rd Qu.: 0.2973
## Max. :0.8649 Max. : 0.5675
## Model coefficients
get(modelac)$betas
## 29 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age -0.110371269
## male 0.053478581
## BreastFeed_full_duration 0.106491593
## BreastFeed_full_stopped -0.056983757
## BreastFeed_total_stopped -0.130809954
## BiW -0.062498656
## MASS_Perc -0.137379843
## HEIGHT_Perc -0.118087003
## BMI_PERC -0.044998977
## M_per_H_PERC -0.049822380
## GLY 0.010268256
## TC -0.111735304
## HDL 0.025674988
## LDL -0.152029070
## Ca -0.008305915
## P -0.082862856
## Mg 0.201323185
## log2_TG 0.022002611
## log2_Se -0.045072287
## log2_Zn -0.132387542
## HGB -0.084029793
## MCV 0.283508908
## PTH 0.081431561
## P1NP 0.138678669
## UREA -0.289097234
## UA -0.110530244
## log2_CTX 0.097580361
## log2_CREA -0.179034229
## log2_IGF1 -0.2222600633.1.3.3 VG vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VN') %>%
mutate(outcome = if_else(GRP == 'VG', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.4567901
## Fit and validate model
modelac <- 'pred_complex_child_all_VG_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.6000000
## lambda 0.1791455
## auc 0.5000000
## auc_optimism_corrected 0.4245962
## auc_optimism_corrected_CIL 0.2087214
## auc_optimism_corrected_CIU 0.6402794
## accuracy 0.5432099
## accuracy_optimism_corrected 0.4229261
## accuracy_optimism_corrected_CIL 0.2366246
## accuracy_optimism_corrected_CIU 0.5956818
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.1324 Min. :-0.04312 Min. :0.5057
## 1st Qu.:0.8268 1st Qu.:0.3550 1st Qu.: 0.36449 1st Qu.:0.7124
## Median :0.8900 Median :0.4269 Median : 0.46237 Median :0.7975
## Mean :0.8636 Mean :0.4246 Mean : 0.43898 Mean :0.7823
## 3rd Qu.:0.9376 3rd Qu.:0.5000 3rd Qu.: 0.54385 3rd Qu.:0.8593
## Max. :1.0000 Max. :0.7273 Max. : 0.79429 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.1613 Min. :-0.1658
## 1st Qu.:0.3585 1st Qu.: 0.2738
## Median :0.4202 Median : 0.3767
## Mean :0.4229 Mean : 0.3594
## 3rd Qu.:0.4839 3rd Qu.: 0.4559
## Max. :0.6786 Max. : 0.7059
## Model coefficients
get(modelac)$betas
## 29 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age 0
## male .
## BreastFeed_full_duration .
## BreastFeed_full_stopped .
## BreastFeed_total_stopped .
## BiW .
## MASS_Perc .
## HEIGHT_Perc .
## BMI_PERC .
## M_per_H_PERC .
## GLY .
## TC .
## HDL .
## LDL .
## Ca .
## P .
## Mg .
## log2_TG .
## log2_Se .
## log2_Zn .
## HGB .
## MCV .
## PTH .
## P1NP .
## UREA .
## UA .
## log2_CTX .
## log2_CREA .
## log2_IGF1 .3.1.3.4 VN vs VG
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'OM') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.622449
## Fit and validate model
modelac <- 'pred_complex_child_all_VN_VG'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.0000000
## lambda 0.7235655
## auc 0.8223305
## auc_optimism_corrected 0.6437341
## auc_optimism_corrected_CIL 0.4675493
## auc_optimism_corrected_CIU 0.7907643
## accuracy 0.7142857
## accuracy_optimism_corrected 0.6341987
## accuracy_optimism_corrected_CIL 0.4736842
## accuracy_optimism_corrected_CIU 0.7765848
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.3273 Min. :-0.0257 Min. :0.6383
## 1st Qu.:0.8980 1st Qu.:0.5858 1st Qu.: 0.2044 1st Qu.:0.8122
## Median :0.9283 Median :0.6504 Median : 0.2744 Median :0.8601
## Mean :0.9211 Mean :0.6437 Mean : 0.2774 Mean :0.8538
## 3rd Qu.:0.9571 3rd Qu.:0.7053 3rd Qu.: 0.3508 3rd Qu.:0.9000
## Max. :1.0000 Max. :0.8878 Max. : 0.5962 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.3889 Min. :-0.06015
## 1st Qu.:0.5833 1st Qu.: 0.14928
## Median :0.6389 Median : 0.22152
## Mean :0.6342 Mean : 0.21961
## 3rd Qu.:0.6897 3rd Qu.: 0.29115
## Max. :0.8333 Max. : 0.52632
## Model coefficients
get(modelac)$betas
## 29 x 1 sparse Matrix of class "dgCMatrix"
## s0
## log2_age -0.150337798
## male -0.015480652
## BreastFeed_full_duration -0.040635603
## BreastFeed_full_stopped 0.205739460
## BreastFeed_total_stopped 0.009038895
## BiW -0.031318165
## MASS_Perc -0.091286922
## HEIGHT_Perc -0.162930580
## BMI_PERC 0.012574086
## M_per_H_PERC 0.023907659
## GLY 0.012894483
## TC -0.046958074
## HDL 0.116489341
## LDL -0.205738166
## Ca 0.025643757
## P 0.018195206
## Mg 0.011956604
## log2_TG 0.234485157
## log2_Se -0.020382580
## log2_Zn -0.124665155
## HGB -0.116479185
## MCV 0.240224485
## PTH 0.215368558
## P1NP 0.100360490
## UREA -0.141122560
## UA -0.036142120
## log2_CTX 0.084311132
## log2_CREA -0.148321250
## log2_IGF1 -0.2179917793.2 Adults
3.2.1 Baseline model
Following models will be trained to discriminate between two diets, based solely on AGE and SEX.
3.2.1.1 Prepare data
Open code
dat_pred <- dat_adult_imputed %>%
select(GRP, FAM, AGE, male)
nfam <- dat_pred$FAM %>% factor() %>% levels() %>% length()
dat_features <- dat_pred %>% select(-FAM, -GRP) %>% as.matrix
colnames(dat_features)
## [1] "AGE" "male"
dat_outcome <- dat_pred %>% select(GRP) %>% as.matrix3.2.1.2 VN vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VG') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.6478873
## Fit and validate model
modelac <- 'pred_baseline_adult_VN_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
get(modelac)$plot
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'Open code
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.2000000
## lambda 0.4340878
## auc 0.5000000
## auc_optimism_corrected 0.5423400
## auc_optimism_corrected_CIL 0.3829571
## auc_optimism_corrected_CIU 0.7098909
## accuracy 0.6478873
## accuracy_optimism_corrected 0.6371038
## accuracy_optimism_corrected_CIL 0.4797907
## accuracy_optimism_corrected_CIU 0.7724352
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.4772 Min. :0.2129 Min. :-0.23936 Min. :0.5035
## 1st Qu.:0.5000 1st Qu.:0.5000 1st Qu.: 0.00000 1st Qu.:0.6224
## Median :0.5587 Median :0.5021 Median : 0.00000 Median :0.6572
## Mean :0.5774 Mean :0.5423 Mean : 0.03508 Mean :0.6578
## 3rd Qu.:0.6479 3rd Qu.:0.5990 3rd Qu.: 0.09408 3rd Qu.:0.6901
## Max. :0.7760 Max. :0.7733 Max. : 0.38934 Max. :0.8440
## accuracy_resamp_validation accuracy_optimism
## Min. :0.1695 Min. :-0.39192
## 1st Qu.:0.5889 1st Qu.:-0.06042
## Median :0.6400 Median : 0.01420
## Mean :0.6371 Mean : 0.02068
## 3rd Qu.:0.6897 3rd Qu.: 0.10019
## Max. :0.9130 Max. : 0.43414
## Model coefficients
get(modelac)$betas
## 2 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE 0
## male .3.2.1.3 VG vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VN') %>%
mutate(outcome = if_else(GRP == 'VG', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.4736842
## Fit and validate model
modelac <- 'pred_baseline_adult_VG_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.60000000
## lambda 0.01329009
## auc 0.52333333
## auc_optimism_corrected 0.46703618
## auc_optimism_corrected_CIL 0.30289855
## auc_optimism_corrected_CIU 0.58972902
## accuracy 0.52631579
## accuracy_optimism_corrected 0.45329487
## accuracy_optimism_corrected_CIL 0.26000000
## accuracy_optimism_corrected_CIU 0.62043269
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.4300 Min. :0.1683 Min. :-0.20455 Min. :0.4167
## 1st Qu.:0.5000 1st Qu.:0.4294 1st Qu.: 0.00000 1st Qu.:0.5263
## Median :0.5137 Median :0.5000 Median : 0.01497 Median :0.5579
## Mean :0.5332 Mean :0.4670 Mean : 0.06621 Mean :0.5654
## 3rd Qu.:0.5586 3rd Qu.:0.5000 3rd Qu.: 0.12488 3rd Qu.:0.5895
## Max. :0.7344 Max. :0.7151 Max. : 0.52763 Max. :0.7419
## accuracy_resamp_validation accuracy_optimism
## Min. :0.1515 Min. :-0.19443
## 1st Qu.:0.3889 1st Qu.: 0.02412
## Median :0.4615 Median : 0.09530
## Mean :0.4533 Mean : 0.11213
## 3rd Qu.:0.5238 3rd Qu.: 0.20063
## Max. :0.6897 Max. : 0.55682
## Model coefficients
get(modelac)$betas
## 2 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE -4.12632e-17
## male .3.2.1.4 VN vs VG
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'OM') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.6715328
## Fit and validate model
modelac <- 'pred_baseline_adult_VN_VG'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 1.00000000
## lambda 0.08085931
## auc 0.50000000
## auc_optimism_corrected 0.54883991
## auc_optimism_corrected_CIL 0.40903409
## auc_optimism_corrected_CIU 0.73334865
## accuracy 0.67153285
## accuracy_optimism_corrected 0.66674384
## accuracy_optimism_corrected_CIL 0.52882908
## accuracy_optimism_corrected_CIU 0.80701754
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.4699 Min. :0.1779 Min. :-0.29205 Min. :0.5036
## 1st Qu.:0.5000 1st Qu.:0.5000 1st Qu.: 0.00000 1st Qu.:0.6350
## Median :0.5385 Median :0.5167 Median : 0.00000 Median :0.6752
## Mean :0.5743 Mean :0.5488 Mean : 0.02549 Mean :0.6728
## 3rd Qu.:0.6488 3rd Qu.:0.5996 3rd Qu.: 0.07911 3rd Qu.:0.7122
## Max. :0.7632 Max. :0.8387 Max. : 0.37786 Max. :0.8406
## accuracy_resamp_validation accuracy_optimism
## Min. :0.4048 Min. :-0.353546
## 1st Qu.:0.6205 1st Qu.:-0.070669
## Median :0.6604 Median : 0.012268
## Mean :0.6667 Mean : 0.006068
## 3rd Qu.:0.7143 3rd Qu.: 0.085948
## Max. :0.8679 Max. : 0.435818
## Model coefficients
get(modelac)$betas
## 2 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE 0
## male .3.2.2 Complete model
Baseline + clinical characteristics
3.2.2.1 Prepare data
Open code
dat_pred <- dat_adult_imputed
dat_features <- dat_pred %>% select(-FAM, -GRP) %>% as.matrix
colnames(dat_features)
## [1] "AGE" "male" "BMI" "WHR"
## [5] "HEIGHT" "MASS" "HG" "PFAT"
## [9] "SBP" "DBP" "GLY" "HDL"
## [13] "LDL" "Ca" "P" "Mg"
## [17] "FE" "SATTRF" "HGB" "MCV"
## [21] "PTH" "UA" "FOLATE" "uPCR"
## [25] "AL_adult" "BREAKS" "log2_FAT" "log2_FFM"
## [29] "log2_TC" "log2_TG" "log2_Se" "log2_Zn"
## [33] "log2_TIBC" "log2_FERR" "log2_TRF" "log2_TRFINDEX"
## [37] "log2_STRF" "log2_CTX" "log2_P1NP" "log2_UI"
## [41] "log2_UREA" "log2_CREA" "log2_HOLOTC" "log2_HCY"
## [45] "log2_MMA" "log2_VIT_D" "log2_uCa" "log2_uCCR"
## [49] "log2_uICR"3.2.2.2 VN vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VG') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.6478873
## Fit and validate model
modelac <- 'pred_complet_adult_VN_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
get(modelac)$plot
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'Open code
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.2000000
## lambda 0.1200082
## auc 0.9593478
## auc_optimism_corrected 0.8547846
## auc_optimism_corrected_CIL 0.7490948
## auc_optimism_corrected_CIU 0.9429849
## accuracy 0.8943662
## accuracy_optimism_corrected 0.7744900
## accuracy_optimism_corrected_CIL 0.6525954
## accuracy_optimism_corrected_CIU 0.8800000
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism
## Min. :0.9347 Min. :0.6540 Min. :-0.006052
## 1st Qu.:0.9844 1st Qu.:0.8242 1st Qu.: 0.097296
## Median :0.9904 Median :0.8561 Median : 0.132035
## Mean :0.9890 Mean :0.8548 Mean : 0.134212
## 3rd Qu.:0.9960 3rd Qu.:0.8906 3rd Qu.: 0.170560
## Max. :1.0000 Max. :0.9768 Max. : 0.336612
## accuracy_resamp_test accuracy_resamp_validation accuracy_optimism
## Min. :0.8462 Min. :0.5690 Min. :-0.0005417
## 1st Qu.:0.9429 1st Qu.:0.7358 1st Qu.: 0.1294366
## Median :0.9577 Median :0.7757 Median : 0.1850655
## Mean :0.9559 Mean :0.7745 Mean : 0.1814488
## 3rd Qu.:0.9722 3rd Qu.:0.8148 3rd Qu.: 0.2275641
## Max. :1.0000 Max. :0.9298 Max. : 0.3817387
## Model coefficients
get(modelac)$betas
## 49 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE -0.03062496
## male 0.07229817
## BMI .
## WHR .
## HEIGHT .
## MASS .
## HG 0.10580374
## PFAT .
## SBP .
## DBP .
## GLY -0.23014855
## HDL .
## LDL -0.35487543
## Ca 0.06963822
## P -0.23750534
## Mg .
## FE .
## SATTRF .
## HGB .
## MCV 0.20337668
## PTH 0.20244539
## UA .
## FOLATE 0.59144460
## uPCR -0.91171129
## AL_adult -0.48157456
## BREAKS .
## log2_FAT -0.02620384
## log2_FFM .
## log2_TC -0.40941964
## log2_TG .
## log2_Se -0.17852882
## log2_Zn -0.48824639
## log2_TIBC .
## log2_FERR -0.46820241
## log2_TRF .
## log2_TRFINDEX 0.12147493
## log2_STRF .
## log2_CTX .
## log2_P1NP 0.37553616
## log2_UI -0.18251390
## log2_UREA -0.50687940
## log2_CREA -0.41883894
## log2_HOLOTC .
## log2_HCY .
## log2_MMA .
## log2_VIT_D 0.42509516
## log2_uCa -0.01533056
## log2_uCCR -0.14743311
## log2_uICR -0.131910123.2.2.3 VG vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VN') %>%
mutate(outcome = if_else(GRP == 'VG', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.4736842
## Fit and validate model
modelac <- 'pred_complet_adult_VG_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.2000000
## lambda 0.2626192
## auc 0.8857778
## auc_optimism_corrected 0.6661926
## auc_optimism_corrected_CIL 0.4875733
## auc_optimism_corrected_CIU 0.8360262
## accuracy 0.8315789
## accuracy_optimism_corrected 0.6081130
## accuracy_optimism_corrected_CIL 0.4201613
## accuracy_optimism_corrected_CIU 0.7670814
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.7145 Min. :0.4091 Min. :-0.05124 Min. :0.5851
## 1st Qu.:0.9428 1st Qu.:0.6067 1st Qu.: 0.22871 1st Qu.:0.8737
## Median :0.9696 Median :0.6685 Median : 0.30101 Median :0.9149
## Mean :0.9628 Mean :0.6662 Mean : 0.29657 Mean :0.9075
## 3rd Qu.:0.9903 3rd Qu.:0.7274 3rd Qu.: 0.36386 3rd Qu.:0.9479
## Max. :1.0000 Max. :0.9226 Max. : 0.57402 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.2917 Min. :-0.1104
## 1st Qu.:0.5556 1st Qu.: 0.2307
## Median :0.6154 Median : 0.2991
## Mean :0.6081 Mean : 0.2994
## 3rd Qu.:0.6667 3rd Qu.: 0.3700
## Max. :0.8125 Max. : 0.5715
## Model coefficients
get(modelac)$betas
## 49 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE .
## male .
## BMI .
## WHR .
## HEIGHT .
## MASS .
## HG .
## PFAT .
## SBP .
## DBP 0.01422657
## GLY .
## HDL .
## LDL .
## Ca .
## P .
## Mg .
## FE .
## SATTRF .
## HGB .
## MCV -0.01252481
## PTH .
## UA .
## FOLATE 0.44187862
## uPCR -0.11239698
## AL_adult .
## BREAKS .
## log2_FAT .
## log2_FFM .
## log2_TC -0.25571695
## log2_TG .
## log2_Se -0.20245555
## log2_Zn -0.20792583
## log2_TIBC .
## log2_FERR -0.11934472
## log2_TRF .
## log2_TRFINDEX 0.28357998
## log2_STRF 0.12923032
## log2_CTX .
## log2_P1NP .
## log2_UI -0.31195970
## log2_UREA .
## log2_CREA .
## log2_HOLOTC -0.20347552
## log2_HCY 0.02254704
## log2_MMA 0.25541403
## log2_VIT_D 0.01330014
## log2_uCa .
## log2_uCCR .
## log2_uICR -0.371243643.2.2.4 VN vs VG
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'OM') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.6715328
## Fit and validate model
modelac <- 'pred_complet_adult_VN_VG'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.6000000
## lambda 0.1065992
## auc 0.8246377
## auc_optimism_corrected 0.6567516
## auc_optimism_corrected_CIL 0.4814422
## auc_optimism_corrected_CIU 0.8026877
## accuracy 0.6934307
## accuracy_optimism_corrected 0.6550346
## accuracy_optimism_corrected_CIL 0.5288291
## accuracy_optimism_corrected_CIU 0.7858469
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.2786 Min. :-0.02315 Min. :0.6496
## 1st Qu.:0.9106 1st Qu.:0.6135 1st Qu.: 0.21380 1st Qu.:0.8102
## Median :0.9407 Median :0.6630 Median : 0.27586 Median :0.8540
## Mean :0.9337 Mean :0.6568 Mean : 0.27691 Mean :0.8514
## 3rd Qu.:0.9633 3rd Qu.:0.7098 3rd Qu.: 0.33902 3rd Qu.:0.8972
## Max. :1.0000 Max. :0.8824 Max. : 0.70024 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.4048 Min. :-0.1458
## 1st Qu.:0.6087 1st Qu.: 0.1284
## Median :0.6545 Median : 0.1922
## Mean :0.6550 Mean : 0.1963
## 3rd Qu.:0.7000 3rd Qu.: 0.2608
## Max. :0.8333 Max. : 0.5588
## Model coefficients
get(modelac)$betas
## 49 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE .
## male .
## BMI .
## WHR .
## HEIGHT .
## MASS .
## HG .
## PFAT .
## SBP .
## DBP -0.4100252897
## GLY .
## HDL .
## LDL -0.0298843600
## Ca .
## P .
## Mg .
## FE .
## SATTRF .
## HGB .
## MCV 0.1445031277
## PTH 0.3573358265
## UA .
## FOLATE .
## uPCR -0.2804439274
## AL_adult -0.1792452846
## BREAKS .
## log2_FAT .
## log2_FFM .
## log2_TC .
## log2_TG .
## log2_Se .
## log2_Zn .
## log2_TIBC .
## log2_FERR .
## log2_TRF .
## log2_TRFINDEX .
## log2_STRF .
## log2_CTX .
## log2_P1NP .
## log2_UI .
## log2_UREA -0.2367771870
## log2_CREA .
## log2_HOLOTC 0.0791558791
## log2_HCY -0.0009791532
## log2_MMA -0.1251544859
## log2_VIT_D .
## log2_uCa .
## log2_uCCR .
## log2_uICR .3.2.3 Reduced model
Baseline + clinical characteristics not affected with vitamin supplementation
3.2.3.1 Prepare data
Open code
dat_pred <- dat_adult_imputed %>%
select(GRP, FAM, AGE, male,
BMI:Mg,
HGB:UA,
AL_adult:log2_Zn,
log2_CTX, log2_P1NP,
log2_UREA:log2_CREA
)
dat_features <- dat_pred %>% select(-FAM, -GRP) %>% as.matrix
colnames(dat_features)
## [1] "AGE" "male" "BMI" "WHR" "HEIGHT" "MASS"
## [7] "HG" "PFAT" "SBP" "DBP" "GLY" "HDL"
## [13] "LDL" "Ca" "P" "Mg" "HGB" "MCV"
## [19] "PTH" "UA" "AL_adult" "BREAKS" "log2_FAT" "log2_FFM"
## [25] "log2_TC" "log2_TG" "log2_Se" "log2_Zn" "log2_CTX" "log2_P1NP"
## [31] "log2_UREA" "log2_CREA"3.2.3.2 VN vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VG') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.6478873
## Fit and validate model
modelac <- 'pred_complex_adult_VN_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
get(modelac)$plot
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'Open code
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 1.00000000
## lambda 0.05634876
## auc 0.87108696
## auc_optimism_corrected 0.78648076
## auc_optimism_corrected_CIL 0.67192971
## auc_optimism_corrected_CIU 0.89560516
## accuracy 0.77464789
## accuracy_optimism_corrected 0.71226811
## accuracy_optimism_corrected_CIL 0.57883333
## accuracy_optimism_corrected_CIU 0.82352941
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.8386 Min. :0.5562 Min. :-0.05388 Min. :0.7376
## 1st Qu.:0.9251 1st Qu.:0.7496 1st Qu.: 0.10153 1st Qu.:0.8389
## Median :0.9403 Median :0.7892 Median : 0.14923 Median :0.8671
## Mean :0.9389 Mean :0.7865 Mean : 0.15240 Mean :0.8661
## 3rd Qu.:0.9557 3rd Qu.:0.8286 3rd Qu.: 0.20162 3rd Qu.:0.8944
## Max. :1.0000 Max. :0.9120 Max. : 0.43843 Max. :0.9930
## accuracy_resamp_validation accuracy_optimism
## Min. :0.4694 Min. :-0.08059
## 1st Qu.:0.6667 1st Qu.: 0.09333
## Median :0.7177 Median : 0.15256
## Mean :0.7123 Mean : 0.15384
## 3rd Qu.:0.7600 3rd Qu.: 0.21353
## Max. :0.8600 Max. : 0.45925
## Model coefficients
get(modelac)$betas
## 32 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE .
## male .
## BMI .
## WHR .
## HEIGHT .
## MASS .
## HG .
## PFAT .
## SBP .
## DBP .
## GLY .
## HDL .
## LDL .
## Ca .
## P .
## Mg .
## HGB .
## MCV .
## PTH .
## UA .
## AL_adult -0.5110666
## BREAKS .
## log2_FAT .
## log2_FFM .
## log2_TC -1.0544345
## log2_TG .
## log2_Se .
## log2_Zn -0.5595906
## log2_CTX .
## log2_P1NP 0.3181682
## log2_UREA -0.9464709
## log2_CREA -0.11490423.2.3.3 VG vs OM
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'VN') %>%
mutate(outcome = if_else(GRP == 'VG', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.4736842
## Fit and validate model
modelac <- 'pred_complex_adult_VG_OM'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.2000000
## lambda 0.6355181
## auc 0.5000000
## auc_optimism_corrected 0.5050453
## auc_optimism_corrected_CIL 0.3253598
## auc_optimism_corrected_CIU 0.6672917
## accuracy 0.5263158
## accuracy_optimism_corrected 0.4761204
## accuracy_optimism_corrected_CIL 0.2872222
## accuracy_optimism_corrected_CIU 0.6461575
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.2125 Min. :-0.1105 Min. :0.5106
## 1st Qu.:0.7939 1st Qu.:0.4512 1st Qu.: 0.2456 1st Qu.:0.6915
## Median :0.8621 Median :0.5064 Median : 0.3411 Median :0.7696
## Mean :0.8357 Mean :0.5050 Mean : 0.3306 Mean :0.7549
## 3rd Qu.:0.9027 3rd Qu.:0.5663 3rd Qu.: 0.4427 3rd Qu.:0.8316
## Max. :1.0000 Max. :0.7593 Max. : 0.7040 Max. :1.0000
## accuracy_resamp_validation accuracy_optimism
## Min. :0.1250 Min. :-0.1480
## 1st Qu.:0.4133 1st Qu.: 0.1940
## Median :0.4848 Median : 0.2881
## Mean :0.4761 Mean : 0.2788
## 3rd Qu.:0.5385 3rd Qu.: 0.3761
## Max. :0.7059 Max. : 0.6260
## Model coefficients
get(modelac)$betas
## 32 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE 0
## male .
## BMI .
## WHR .
## HEIGHT .
## MASS .
## HG .
## PFAT .
## SBP .
## DBP .
## GLY .
## HDL .
## LDL .
## Ca .
## P .
## Mg .
## HGB .
## MCV .
## PTH .
## UA .
## AL_adult .
## BREAKS .
## log2_FAT .
## log2_FFM .
## log2_TC .
## log2_TG .
## log2_Se .
## log2_Zn .
## log2_CTX .
## log2_P1NP .
## log2_UREA .
## log2_CREA .3.2.3.4 VN vs VG
Open code
## Prepare working data
dat_pred_temp <- dat_pred %>%
filter(GRP != 'OM') %>%
mutate(outcome = if_else(GRP == 'VN', 1, 0)) %>%
select(-GRP)
mean(dat_pred_temp$outcome)
## [1] 0.6715328
## Fit and validate model
modelac <- 'pred_complex_adult_VN_VG'
assign(modelac, run(
clustered_glmnet(
data = dat_pred_temp,
outcome = 'outcome',
clust_id = 'FAM',
sample_method = 'atypboot',
N = 500,
alphas = seq(0, 1, by = 0.2),
seed = 368),
path = paste0('gitignore/run/', modelac),
reuse = TRUE))
## Calibration curve
#get(modelac)$plot
## Model summary
t(get(modelac)$model_summary)
## [,1]
## alpha 0.0000000
## lambda 0.7353789
## auc 0.8140097
## auc_optimism_corrected 0.6556023
## auc_optimism_corrected_CIL 0.4932692
## auc_optimism_corrected_CIU 0.7907299
## accuracy 0.7080292
## accuracy_optimism_corrected 0.6500979
## accuracy_optimism_corrected_CIL 0.5185185
## accuracy_optimism_corrected_CIU 0.7659574
get(modelac)$valid_performances %>% summary
## auc_resamp_test auc_resamp_validation auc_optimism accuracy_resamp_test
## Min. :0.5000 Min. :0.3782 Min. :-0.03536 Min. :0.6475
## 1st Qu.:0.8682 1st Qu.:0.6118 1st Qu.: 0.17729 1st Qu.:0.7704
## Median :0.9001 Median :0.6599 Median : 0.23573 Median :0.8058
## Mean :0.8931 Mean :0.6556 Mean : 0.23749 Mean :0.8074
## 3rd Qu.:0.9261 3rd Qu.:0.7059 3rd Qu.: 0.29642 3rd Qu.:0.8456
## Max. :0.9897 Max. :0.8633 Max. : 0.58373 Max. :0.9485
## accuracy_resamp_validation accuracy_optimism
## Min. :0.4048 Min. :-0.08852
## 1st Qu.:0.6042 1st Qu.: 0.09059
## Median :0.6522 Median : 0.14748
## Mean :0.6501 Mean : 0.15732
## 3rd Qu.:0.6949 3rd Qu.: 0.21683
## Max. :0.8077 Max. : 0.46158
## Model coefficients
get(modelac)$betas
## 32 x 1 sparse Matrix of class "dgCMatrix"
## s0
## AGE -0.130969640
## male 0.046847677
## BMI -0.025892148
## WHR -0.071845276
## HEIGHT -0.057146386
## MASS -0.052079055
## HG 0.018645667
## PFAT -0.072209270
## SBP 0.025161999
## DBP -0.218387885
## GLY -0.054679974
## HDL 0.007689509
## LDL -0.110257424
## Ca 0.016184827
## P -0.086481596
## Mg 0.011665361
## HGB 0.007370975
## MCV 0.179203214
## PTH 0.210230420
## UA 0.057920958
## AL_adult -0.204033621
## BREAKS -0.084629183
## log2_FAT -0.065394397
## log2_FFM -0.007507923
## log2_TC -0.078258475
## log2_TG 0.017640442
## log2_Se 0.003677581
## log2_Zn -0.139279629
## log2_CTX 0.105505924
## log2_P1NP 0.094490749
## log2_UREA -0.199522992
## log2_CREA -0.0781046813.3 Results summaries
3.3.1 Prepare data
Open code
models_names <- c(
"pred_baseline_child_all_VN_OM",
"pred_baseline_child_all_VG_OM",
"pred_baseline_child_all_VN_VG",
"pred_complex_child_all_VN_OM",
"pred_complex_child_all_VG_OM",
"pred_complex_child_all_VN_VG",
"pred_complet_child_all_VN_OM",
"pred_complet_child_all_VG_OM",
"pred_complet_child_all_VN_VG",
"pred_baseline_adult_VN_OM",
"pred_baseline_adult_VG_OM",
"pred_baseline_adult_VN_VG",
"pred_complex_adult_VN_OM",
"pred_complex_adult_VG_OM",
"pred_complex_adult_VN_VG",
"pred_complet_adult_VN_OM",
"pred_complet_adult_VG_OM",
"pred_complet_adult_VN_VG"
)
tr <- get(models_names[1])$model_summary[
c(
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"
)
]
for (i in 2:length(models_names)) {
tr <- dplyr::bind_rows(tr, get(models_names[i])$model_summary[
c(
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"
)
])
}
tr <- data.frame(models_names,
y_pos = c(
rep(0.9, 3),
rep(0.8, 3),
rep(0.7, 3),
rep(0.5, 3),
rep(0.4, 3),
rep(0.3, 3)
),
age_group = c(
rep("children", 9),
rep("adults", 9)
),
contrast = str_sub(models_names, -5, -1),
model = str_sub(models_names, 6, 12),
tr
) %>%
mutate(
model = fct_recode(model,
"baseline" = "baselin",
"reduced" = "complex",
"full" = "complet"
)
) %>%
rename(
AUC = auc_optimism_corrected,
CI_L = auc_optimism_corrected_CIL,
CI_U = auc_optimism_corrected_CIU
) %>%
select(
age_group, model, contrast, AUC, CI_L, CI_U, models_names, y_pos
)3.3.2 Models performance summary
Open code
tr %>%
mutate(`AUC [95% CI]` = paste0(
round(AUC, 2),
" [",
round(CI_L, 2),
", ",
round(CI_U, 2),
"]"
)) %>%
select(age_group:contrast, `AUC [95% CI]`) %>%
kableExtra::kable(caption = "Results of elastic net predictive models distinguishing between diet groups. `age_group` indicates the cohort (children or adults). `model` specifies the covariates included: `baseline` includes only basic patient characteristics (age, sex, breastfeeding-related covariates and birth weight for children); `reduced` adds clinical characteristics not primarily driven by supplementation; `full` includes all clinical characteristics. `contrast` denotes the pair of diet groups used for model training: `VN_OM` (vegans vs. omnivores), `VG_VN` (vegetarians vs. vegans), and `OM_VG` (omnivores vs. vegetarians). `AUC [95% CI]` represents the out-of-sample AUC estimated via clustered bootstrap, with corresponding 95% confidence intervals.")| age_group | model | contrast | AUC [95% CI] |
|---|---|---|---|
| children | baseline | VN_OM | 0.54 [0.36, 0.69] |
| children | baseline | VG_OM | 0.57 [0.33, 0.78] |
| children | baseline | VN_VG | 0.59 [0.39, 0.76] |
| children | reduced | VN_OM | 0.65 [0.46, 0.8] |
| children | reduced | VG_OM | 0.42 [0.21, 0.64] |
| children | reduced | VN_VG | 0.64 [0.47, 0.79] |
| children | full | VN_OM | 0.86 [0.74, 0.95] |
| children | full | VG_OM | 0.59 [0.32, 0.85] |
| children | full | VN_VG | 0.7 [0.52, 0.88] |
| adults | baseline | VN_OM | 0.54 [0.38, 0.71] |
| adults | baseline | VG_OM | 0.47 [0.3, 0.59] |
| adults | baseline | VN_VG | 0.55 [0.41, 0.73] |
| adults | reduced | VN_OM | 0.79 [0.67, 0.9] |
| adults | reduced | VG_OM | 0.51 [0.33, 0.67] |
| adults | reduced | VN_VG | 0.66 [0.49, 0.79] |
| adults | full | VN_OM | 0.85 [0.75, 0.94] |
| adults | full | VG_OM | 0.67 [0.49, 0.84] |
| adults | full | VN_VG | 0.66 [0.48, 0.8] |
3.3.3 Comparison of baseline vs more complex models
Open code
baseline <- tr %>% filter(model == "baseline")
full <- tr %>% filter(model == "full")
reduced <- tr %>% filter(model == "reduced")
compmod <- compare_predmod(
get(baseline$models_names[1]),
get(reduced$models_names[1])
)
comp_table <- data.frame(
age_group = baseline$age_group[1],
contrast = baseline$contrast[1],
auc_gain = compmod[1],
CI_L = compmod[2],
CI_U = compmod[3],
P = compmod[4],
models = interaction(baseline$model[1], reduced$model[1])
)
for (i in 2:nrow(baseline)) {
compmod <- compare_predmod(
get(baseline$models_names[i]),
get(reduced$models_names[i])
)
comp_table_tmp <- data.frame(
age_group = baseline$age_group[i],
contrast = baseline$contrast[i],
auc_gain = compmod[1],
CI_L = compmod[2],
CI_U = compmod[3],
P = compmod[4],
models = interaction(baseline$model[i], reduced$model[i])
)
comp_table <- dplyr::bind_rows(comp_table, comp_table_tmp)
i <- i + 1
}
comp_table
## age_group contrast auc_gain CI_L CI_U P
## 1 children VN_OM 0.11529848 -0.09366843 0.2846901 0.23752495
## 2 children VG_OM -0.15032444 -0.41028705 0.1104278 0.24151697
## 3 children VN_VG 0.05133982 -0.15804438 0.2717500 0.55289421
## 4 adults VN_OM 0.24414073 0.04718529 0.4171286 0.01796407
## 5 adults VG_OM 0.03800912 -0.17700521 0.2350962 0.68063872
## 6 adults VN_VG 0.10676236 -0.12723077 0.2916598 0.32135729
## models
## 1 baseline.reduced
## 2 baseline.reduced
## 3 baseline.reduced
## 4 baseline.reduced
## 5 baseline.reduced
## 6 baseline.reduced
dim1 <- nrow(comp_table)
for (i in (dim1 + 1):(dim1 + nrow(baseline))) {
compmod <- compare_predmod(
get(baseline$models_names[i - dim1]),
get(full$models_names[i - dim1])
)
comp_table_tmp <- data.frame(
age_group = baseline$age_group[i - dim1],
contrast = baseline$contrast[i - dim1],
auc_gain = compmod[1],
CI_L = compmod[2],
CI_U = compmod[3],
P = compmod[4],
models = interaction(baseline$model[i - dim1], full$model[i - dim1])
)
comp_table <- dplyr::bind_rows(comp_table, comp_table_tmp)
}
comp_table %>%
mutate(
`AUC gain [95% CI]` = paste0(
round(auc_gain, 2), " [", round(CI_L, 2), ", ", round(CI_U, 2), "]"
),
models = fct_recode(models,
"reduced vs baseline" = "baseline.reduced",
"full vs baseline" = "baseline.full"
)
) %>%
select(
models, age_group, contrast, `AUC gain [95% CI]`, P
) %>%
kableExtra::kable(
caption = "Comparison of elastic net predictive models evaluating the gain in discriminative performance when including additional clinical characteristics. The `models` column compares either `reduced vs baseline` (adding clinical characteristics not primarily driven by supplementation) or `full vs baseline` (adding all clinical characteristics). `age_group` indicates the cohort (children or adults). `contrast` specifies the diet groups used for model training: `VN_OM` (vegans vs. omnivores), `VG_OM` (vegetarians vs. omnivores), and `VN_VG` (vegans vs. vegetarians). `AUC gain [95% CI]` represents the estimated improvement in out-of-sample AUC with a more complex model, with clustered bootstrap providing corresponding 95% confidence intervals. `P` indicates the p-value assessing the statistical significance of the AUC gain."
)| models | age_group | contrast | AUC gain [95% CI] | P |
|---|---|---|---|---|
| reduced vs baseline | children | VN_OM | 0.12 [-0.09, 0.28] | 0.2375250 |
| reduced vs baseline | children | VG_OM | -0.15 [-0.41, 0.11] | 0.2415170 |
| reduced vs baseline | children | VN_VG | 0.05 [-0.16, 0.27] | 0.5528942 |
| reduced vs baseline | adults | VN_OM | 0.24 [0.05, 0.42] | 0.0179641 |
| reduced vs baseline | adults | VG_OM | 0.04 [-0.18, 0.24] | 0.6806387 |
| reduced vs baseline | adults | VN_VG | 0.11 [-0.13, 0.29] | 0.3213573 |
| full vs baseline | children | VN_OM | 0.33 [0.16, 0.51] | 0.0019960 |
| full vs baseline | children | VG_OM | 0.02 [-0.28, 0.35] | 0.9281437 |
| full vs baseline | children | VN_VG | 0.11 [-0.12, 0.37] | 0.3373253 |
| full vs baseline | adults | VN_OM | 0.31 [0.12, 0.51] | 0.0019960 |
| full vs baseline | adults | VG_OM | 0.2 [-0.01, 0.42] | 0.0618762 |
| full vs baseline | adults | VN_VG | 0.11 [-0.12, 0.31] | 0.3173653 |
3.3.4 Visualize
Open code
p1 <- tr %>%
mutate(
contrast = factor(contrast, levels = c("VN_OM", "VG_OM", "VN_VG")),
model = factor(model, levels = c("baseline", "reduced", "full"))
) %>%
ggplot(aes(x = AUC, y = model, color = model)) +
geom_point(size = 5, shape = 18) +
geom_segment(
aes(
x = CI_L, xend = CI_U,
y = model, yend = model
),
linewidth = 1.1
) +
coord_cartesian(xlim = c(0.1, 1)) +
geom_vline(xintercept = 0.5, linetype = "dashed") +
facet_wrap(~ age_group + contrast,
labeller = labeller(
contrast = c(
"VN_OM" = "VN vs OM",
"VG_OM" = "VG vs OM",
"VN_VG" = "VN vs VG"
)
)
) +
theme(
axis.title.y = element_blank(),
legend.position = "bottom"
) +
labs(
x = "Out-of-sample AUC",
color = "Model"
) +
scale_color_manual(values = c(
"baseline" = "grey50",
"full" = "red",
"reduced" = "purple"
))
p1VN vs OM, left), vegetarians vs. omnivores (VG vs OM, middle), and vegans vs. vegetarians (VN vs VG, right). Models include baseline (only basic patient characteristics: age, sex, breastfeeding-related covariates and birth weight for children), reduced (adding clinical characteristics not primarily driven by supplementation), and full (including all clinical characteristics). Points represent out-of-sample AUC estimates, indicating the ability of models to discriminate between diet groups, with lines showing 95% confidence intervals estimated via clustered bootstrap.4 Reproducibility
Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Prague
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] mice_3.17.0 patchwork_1.2.0 ggrepel_0.9.5 robustlmm_3.3-1
## [5] gridExtra_2.3 pheatmap_1.0.12 performance_0.12.2 quantreg_5.98
## [9] SparseM_1.81 bayesplot_1.8.1 ggdist_3.3.2 kableExtra_1.4.0
## [13] lubridate_1.8.0 corrplot_0.92 arm_1.12-2 MASS_7.3-64
## [17] projpred_2.0.2 glmnet_4.1-8 boot_1.3-31 cowplot_1.1.1
## [21] pROC_1.18.0 mgcv_1.9-1 nlme_3.1-167 openxlsx_4.2.5
## [25] flextable_0.9.6 sjPlot_2.8.16 car_3.1-2 carData_3.0-5
## [29] gtsummary_2.0.2 emmeans_1.10.4 ggpubr_0.4.0 lme4_1.1-35.5
## [33] Matrix_1.7-0 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [37] purrr_1.0.2 readr_2.1.2 tidyr_1.3.1 tibble_3.2.1
## [41] ggplot2_3.5.1 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] splines_4.4.3 later_1.3.0 gamm4_0.2-6
## [4] cellranger_1.1.0 datawizard_0.12.2 rpart_4.1.24
## [7] reprex_2.0.1 lifecycle_1.0.4 rstatix_0.7.0
## [10] lattice_0.22-5 insight_0.20.2 backports_1.5.0
## [13] magrittr_2.0.3 rmarkdown_2.27 yaml_2.3.5
## [16] httpuv_1.6.5 skimr_2.1.5 zip_2.2.0
## [19] askpass_1.1 DBI_1.1.2 minqa_1.2.4
## [22] RColorBrewer_1.1-2 multcomp_1.4-18 abind_1.4-5
## [25] rvest_1.0.2 nnet_7.3-20 TH.data_1.1-0
## [28] sandwich_3.0-1 gdtools_0.3.7 crul_1.5.0
## [31] MatrixModels_0.5-3 svglite_2.1.3 codetools_0.2-19
## [34] xml2_1.3.3 tidyselect_1.2.1 shape_1.4.6
## [37] farver_2.1.0 ggeffects_1.7.0 httpcode_0.3.0
## [40] base64enc_0.1-3 matrixStats_1.3.0 jsonlite_1.8.8
## [43] mitml_0.4-3 ellipsis_0.3.2 ggridges_0.5.3
## [46] survival_3.7-0 iterators_1.0.14 systemfonts_1.0.4
## [49] foreach_1.5.2 tools_4.4.3 ragg_1.2.1
## [52] Rcpp_1.0.13 glue_1.7.0 pan_1.6
## [55] xfun_0.46 distributional_0.4.0 loo_2.4.1
## [58] withr_3.0.1 fastmap_1.2.0 fansi_1.0.6
## [61] openssl_1.4.6 digest_0.6.37 R6_2.5.1
## [64] mime_0.12 estimability_1.5.1 textshaping_0.3.6
## [67] colorspace_2.0-2 utf8_1.2.4 generics_0.1.3
## [70] fontLiberation_0.1.0 data.table_1.15.4 robustbase_0.93-9
## [73] httr_1.4.2 htmlwidgets_1.6.4 pkgconfig_2.0.3
## [76] gtable_0.3.0 htmltools_0.5.8.1 fontBitstreamVera_0.1.1
## [79] scales_1.3.0 knitr_1.48 rstudioapi_0.16.0
## [82] tzdb_0.2.0 uuid_1.0-3 coda_0.19-4
## [85] curl_4.3.2 nloptr_2.0.0 repr_1.1.7
## [88] zoo_1.8-9 sjlabelled_1.2.0 parallel_4.4.3
## [91] pillar_1.9.0 vctrs_0.6.5 promises_1.2.0.1
## [94] jomo_2.7-3 dbplyr_2.1.1 xtable_1.8-4
## [97] evaluate_1.0.0 fastGHQuad_1.0.1 mvtnorm_1.1-3
## [100] cli_3.6.3 compiler_4.4.3 rlang_1.1.4
## [103] crayon_1.5.0 rstantools_2.1.1 ggsignif_0.6.3
## [106] labeling_0.4.2 modelr_0.1.8 plyr_1.8.6
## [109] sjmisc_2.8.10 fs_1.6.4 stringi_1.7.6
## [112] viridisLite_0.4.0 assertthat_0.2.1 munsell_0.5.0
## [115] fontquiver_0.2.1 sjstats_0.19.0 hms_1.1.1
## [118] gfonts_0.2.0 shiny_1.9.1 haven_2.4.3
## [121] broom_1.0.6 DEoptimR_1.0-10 readxl_1.3.1
## [124] officer_0.6.6